Tendrl / monitoring-integration

Component that enables integration with external monitoring services.
GNU Lesser General Public License v2.1
4 stars 13 forks source link

Stopping volume - alerts #391

Open ltrilety opened 6 years ago

ltrilety commented 6 years ago

If user stop a volume, such action generates a lot of alerts about bricks stop. That leads to state where the volume is stated as 'degraded' and even the cluster is declared as 'unhealthy' because of those stopped bricks. We should differentiate this scenario from the one where bricks are stopping randomly. In this case stop of bricks is expected behaviour as the volume is stopped.

image

Tested version: tendrl-ansible-1.6.1-2.el7rhgs.noarch tendrl-commons-1.6.1-1.el7rhgs.noarch tendrl-ui-1.6.1-1.el7rhgs.noarch tendrl-selinux-1.5.4-2.el7rhgs.noarch tendrl-api-1.6.1-1.el7rhgs.noarch tendrl-api-httpd-1.6.1-1.el7rhgs.noarch tendrl-monitoring-integration-1.6.1-1.el7rhgs.noarch tendrl-grafana-selinux-1.5.4-2.el7rhgs.noarch tendrl-notifier-1.6.0-1.el7rhgs.noarch tendrl-node-agent-1.6.1-1.el7rhgs.noarch tendrl-grafana-plugins-1.6.1-1.el7rhgs.noarch

shtripat commented 6 years ago

@ltrilety this is how it is implemented at the moment as based on state/status changes reported from get-state output from gluster these alerts are raised for entities. The expectation you have here actually means tendrl to have some kind of hierarchical correlation to be set between entities while raising alerts and suppress alerts for some entities based on some rules defined in system. This certainly would be considerable amount of work and would need an architectural discussion.

@r0h4n @nthomas-redhat ^^^

r0h4n commented 6 years ago

This is an RFE, since Tendrl is a monitoring tool and does not understand if the Volume was stopped with good intentions or if it was stopped abruptly or whatever.

Once Tendrl has capabilities like "Volume stop" from Tendrl API then we can address this request.