arachnys / cabot

Self-hosted, easily-deployable monitoring and alerts service - like a lightweight PagerDuty
MIT License
5.6k stars 594 forks source link

Summary / reporting dashboard #530

Open Exocomp opened 7 years ago

Exocomp commented 7 years ago

It would be great if there was a reporting dashboard that gave a summary of what checks/services/instances failed in the past desired amount of time. This will be useful in cases where you want to see a trend or just to get a snapshot of the events that took place.

So for example one can at the end of the day or maybe even at the end of a week (time should be user configurable) go to the dashboard and see a list of all checks that were triggered. For a heavy Cabot user this will be a really helpful resource.

Note, I understand there is a status chart the top of each instance/service but this is severely lacking and not helpful in the way I described above.

dbuxton commented 7 years ago

Can you provide a bit more detail about the sort of insight you want to get and what it might look like? What in particular is "not helpful" about the current status chart? Is it a case of displaying the same data better, or getting more control over the data, or something else?

Exocomp commented 7 years ago

For example, suppose you had a few metrics during the day/week that triggered an alert and in response at the time you took care of the issue and moved on. It would be nice if at the end of the day/week/whenever you do your review to be able to review the metrics that triggered to see if it needs adjustments.

Currently with the above case that is not possible with the status chart, because the status chart does not give you the list of the metrics that failed and it is only for the past 24 hours. To gather the necessary data one would have to go through the history of each metric one by one to see what happened. This works if you have 10 metrics your checking but not when you have 500.