Open jolynch opened 9 years ago
Imho it depends on how sophisticated this whole thing needs to be.
I think handlers are the right place to query for related events (for example, "I am processing event FOO on host BAR, sensu api - give me all related events") but they are not necessarily the right place to generate the relationship. It could be a standalone process running in the background working off of data in redis and writing back to redis, from where sensu-api could be reading data about "related" events.
Otoh, if relationships are simple ("other events on this host", "other same events on hosts in same region/datacenter/etc"), it might be ok to put all of it into handlers.
Also, have you seen dependencies? https://sensuapp.org/docs/latest/checks We are not using this feature enough, there are a ton of events for which we could do dependencies in order to reduce the number of events that end up being processed by handlers. This only works for simple relationships however.
Handlers are an easy start to put this because they have all the data they need to answer the question, and they are right there to append the output of the alert with the related_alerts raw text.
Even if we had an out of band process doing the formulation and putting the results in a stash or something, we would still need the related alerts function at the handler level retrieve that stash and stick it in the alert text.
We've talked about this idea forever at yelp, and I would be down for even the simplest implementation (related alerts on the hosts) as a first pass at this.
It does mean increased load on the api :( But I have some other ideas on how to solve that.
So something that I've wanted for a long time is the ability for symptom alerting to include causal alert information. After chatting a bit with @solarkennedy we think that the right place to put this functionality is here in the sensu_handlers.
The idea is that the pagerduty/jira handler can query the sensu api for "related" events, probably by tag or host etc ... Then it would include this contextual information in the call to action alert.
Thoughts? Is this the right place to do it, is this a crazy idea?