Stack Monitoring comes with some pre-built rules for monitoring the overall health of cluster + sub-categories.
It would be nice if there was a pre-built rule under stack monitoring that would detect errors/failures of snapshots.
Describe a specific use case for the feature:
Snapshots are the only supported way to backup data on an Elasticsearch cluster. Currently if you want to monitor for failures with snapshots, you need to either manually check the snapshot status or create a custom rule/alert. Since snapshots are a core functionality of having a healthy cluster/environment, I think that Stack Monitoring should come with a pre-built rule just for this.
This rule could possibly go under the Stack Monitoring -> Elasticsearch -> Overview -> Errrors and exceptions
Describe the feature:
Stack Monitoring comes with some pre-built rules for monitoring the overall health of cluster + sub-categories.
It would be nice if there was a pre-built rule under stack monitoring that would detect errors/failures of snapshots.
Describe a specific use case for the feature:
Snapshots are the only supported way to backup data on an Elasticsearch cluster. Currently if you want to monitor for failures with snapshots, you need to either manually check the snapshot status or create a custom rule/alert. Since snapshots are a core functionality of having a healthy cluster/environment, I think that Stack Monitoring should come with a pre-built rule just for this.
This rule could possibly go under the
Stack Monitoring
->Elasticsearch
->Overview
->Errrors and exceptions