bosun-monitor / bosun

Time Series Alerting Framework
http://bosun.org
MIT License
3.4k stars 495 forks source link

Can't access any variable other than "." or "$" in the notification block. #1075

Closed gdiv90 closed 9 years ago

gdiv90 commented 9 years ago

I am using notification block to make a post call but from that post call I also want to send the actual expression that caused this notification to trigger along with some other values. The only variable that I can find is "." or "$" that is available in the notification block and that only contains the template subject.

Can you please tell me what other variables are available in the notification block or is there some other way which will help me in solving my problem.

Below is the notification block that I am using:

notification chat {
    post = http://services.sokrati.com/eventManagerService/events
    body = {"agencyId":19,"text":{{.|json}},"apiKey":"2847abc23","@class":"com.sokrati.eventManagerSvcObjects.PostEventsRequest","events":[{"@cla
ss":"com.sokrati.eventManagerSvcObjects.data.JerichoEvents","eventType":"EXTENSION_UPDATE","account":11883,"vendor":"google.com","eventName":"EXT
ENSION_UPDATE","eventTimestamp":1432143444000,"expirationDate":1432243444000,"application":"eventAlerts","metaData":null}],"clientId":15138,"user
Id":205}
    contentType = application/json
}

In the body you can see that I am using "text: {{.|json}}" to get the values available but I am only getting the template subject. Below is the request object that I am getting:

{"agencyId":19,"text":"critical: cpu.is.too.high on fa221ce55ab5", "apiKey":"2847abc23","@class":"com.sokrati.eventManagerSvcObjects.PostEventsRequest","events":[{"@class":"com.sokrati.eventManagerSvcObjects.data.JerichoEvents","eventType":"EXTENSION_UPDATE","account":11883,"vendor":"google.com","eventName":"EXTENSION_UPDATE","eventTimestamp":1432143444000,"expirationDate":1432243444000,"application":"eventAlerts","metaData":null}],"clientId":15138,"userId":205}

You can see the text in the request object is only "critical: cpu.is.too.high on fa221ce55ab5", which is the subject of my template.

maddyblue commented 9 years ago

Notifications can be used by various alerts. Thus, it does not make sense to have alert-specific variables in a notification. If you want to send other data in your notification, then you must modify the alert's template to produce that data. Set the subject field of the template to what you'd like it to be. It has access to all of an alert's data.

gdiv90 commented 9 years ago

Thanks @mjibson for your help.

We needed a feature where tickets get auto closed when the severity goes from warn/crit to normal. Since this feature was yet to come in bosun, till then we thought, if we could some how save the expression in our db when the alert runs, then our offline app could run and keep checking for the severity of that alert and if at some point it becomes normal, we will close the ticket using your api and the another ticketing system that we use in our organization i.e OTRS from our app. We will manage it from our end till you give us the auto close feature.

When is the next version of bosun releasing with this auto close feature?

kylebrandt commented 9 years ago

I believe the idea of autoclosing alerts to be a "considered harmful" feature, and currently have no intention of implementing this any time soon.

Even though an alert may have returned to a "normal" severity state, there is high probability the there is still an underlying issue that needs to be addressed that usually presents itself in two forms:

  1. If the alert triggered and there was no actual response needed - then the alert should not have been triggered in the first place - it was essentially noise which can lead to alert desensitization. In this a human should tune the alerting logic.
  2. If it wasn't noise, then there is some problem that needs to be addressed because it likely will occur again. Ignoring these sort of incidents can hide latent issues that can eventually lead more serious impact on your service.

There are exceptions to to this, sometimes it really was just a one off thing. But I don't believe a machine can make the decision reliably, a human needs to make this decision.

Otherwise the alert serves more of an "info" function. We do a have log directive for alerts which basically bypasses the alert handling workflow when you want this behavior.

I've discussed this aspect with people a few different times and haven't been presented with a user story that makes me think holding on to this belief is incorrect. I am sympathetic that this is a deviation from the pattern found is most alerting systems, so I would consider counter cases to this seriously if you provide evidence or user stories that indicate that this philosophy is specious.

My recent (today's) monitorama presentation touches upon this and perhaps explains the thinking a little bit better. I believe it will be available online in the next week or two.