apache / cloudstack

Apache CloudStack is an opensource Infrastructure as a Service (IaaS) cloud computing platform
https://cloudstack.apache.org/
Apache License 2.0
2.1k stars 1.11k forks source link

Error messages and Alerts should include contextual and actionable information #7297

Open boubouX opened 1 year ago

boubouX commented 1 year ago
ISSUE TYPE
COMPONENT NAME
UI and Alert messages
CLOUDSTACK VERSION
14.7.2
CONFIGURATION
OS / ENVIRONMENT

NA

SUMMARY

Alert messages should include more contextual and actionable information instead of various IDs to address the issue quickly.

STEPS TO REPRODUCE
During error conditions and alerts
EXPECTED RESULTS
Error messages should include account, domain, template, storage, VM names, and any other easily referenced information. 

"Template 06cfc10e-a07d-4cee-8929-ca98946c60ab Windows 2022 Data Center failed to upload. Error details: Maximum number of resources of type secondary_storage for account _accountname_ has exceeded"
ACTUAL RESULTS
"Template 06cfc10e-a07d-4cee-8929-ca98946c60ab failed to upload. Error details: Maximum number of resources of type secondary_storage for account/project has exceeded"
DaanHoogland commented 1 year ago

@boubouX the example you give is rather actionable, but the general descrition of the issue is no. There are a lot of examples of possible improvable alerts and events. Do you want just this small change? I don't think you do, but correct me if I am wrong. In the other case either an inventory or a lot of issues like this have to be made. I hope I'm not discouraging you ;)

boubouX commented 1 year ago

Changes are requested to be throughout; using only resource IDs to generate user and operator messages defeats the purpose of the message in the first place. I can live with it for system logs but not UI and Email notifications. The simple example I provided is an actual alert message I received.

Here are other examples of messages we received: Failed to register template: 0dc2355f-324e-441c-9e28-0980eeffe472 with error: HTTP Server returned 404 (expected 200 OK)

HA starting VM: r-305-VM (r-305-VM)

Health checks failed: 2 failing checks on router fbfe150a-fb4f-4037-9150-82dd1212c947

Unable to attach storage pool21 to the host37

Better: Insufficient capacity to restart VM, name: LAMP2, id: 59 which was running on host name: cs-kvm02(id:10), availability zone: Milton1, pod: Milton1-Pod1 In this case the host cs-kvm02 was down.

Good: If the agent for host [name: cs-kvm06 (id:37), availability zone: Milton1, pod: Milton1-Pod1] is not restarted within alert.wait seconds, host will go to Alert state

I would suggest creating simple standard guidelines for developers/contributors related to human-received messages and reviewing them all to that standard.