alertmanager / alert_manager

Splunk Alert Manager with advanced reporting on alerts, workflows (modify assignee, status, severity) and auto-resolve features
Other
81 stars 44 forks source link

Some Alerts are not being written to "incident_results" kv collection #175

Closed ozanpasa closed 7 years ago

ozanpasa commented 7 years ago

Currently Running Splunk v 6.5.2 and Alert Manager v 2.1.4

We have been seeing an issue where a few specific alerts are not being written to the incident_results kv collection. I have deleted the alerts and re-created them, but this did not solve the problem.

Issue: Alert Manager does not display incident details in the GUI for some events. image

Below is the Display_fields entry in "Incident Settings" for the particular alert in question first_time,last_time,sourcetype,alert.signature,action,src_ip,src_port,src_ipi_post,dest_ip,dest_port,http_method,url,http_referrer,payload_printable,src_ipi_hva,impact,urgency

I have parsed through the alert_manager.log file and found all the entries pertaining to this specific event, and everything looks normal. (below)

When I do a search for the incident_id in the "incidents" collection, I get a result image

However, when I do a search for the incident_id in the "incident_results" collection I do not get a result. image

johnfromthefuture commented 7 years ago

Based on the issue as described and the log messages shown, I would be inclined to think that there is an issue on the KV store side. Is the data you are posting extremely long or does it have any non-standard characters?

In looking at the code, though, I do see a possibility where a call the KV store could fail silently. I would advise modifying your alert_manager.py script ($SPLUNK_HOME/etc/apps/alert_manager/bin/alert_manager.py) to uncomment lines 270 and 271:

log.debug("serverResponse: %s" % serverResponse)
log.debug("serverContent: %s" % serverContent)

I'm specifically interested in the serverResponse value and the content may be of use as well.

Anyway, uncomment those (if you can), and post the findings. I may be able to help more after that.

ozanpasa commented 7 years ago

When you said "non-standard characters" it triggered me to look closer at the actual fields being written to the incident_results collection. As per my original post:

Below is the Display_fields entry in "Incident Settings" for the particular alert in question first_time,last_time,sourcetype,alert.signature,action,src_ip,src_port,src_ipi_post,dest_ip,dest_port,http_method,url,http_referrer,payload_printable,src_ipi_hva,impact,urgency

It so happens that the issue indeed was a character issue. Looking at the above setting you will see one of the fields is "alert.signature". I decided to fieldalias "alert.signature" to "alert_signature" and it fixed the issue. The fields are now being written to the incident_results collection and being displayed in the GUI.

Would you consider this a bug, since "alert.signature" is a valid field name in Splunk itself?

johnfromthefuture commented 7 years ago

These types of fields are tricky at best. You can do "alert.signature"="some value" when searching and splunk recognizes that without issue. Other commands, however, don't recognize the field so easily and you have to rename the field to use it effectively.

So from an alert manager perspective, I wouldn't see this as a bug really. It might be worth considering doing a simple find/replace operation on any passed field name to replace . with _ but that could end up causing confusion. I'll leave this open for a bit to see if anyone else has opinions/feedback.

johnfromthefuture commented 7 years ago

Closing this out since no further feedback came up. The underlying issue, as I see it, is Splunk's somewhat inconsistent treatment of dot-notated fields and how these fields are valid in some circumstances and not in others.

peacand commented 5 years ago

Hi Simon,

We face the exact same issue. But for us the workaround is not easy. In our configuration the logs sent to Splunk are coming from LogStash, which sends 100% of the logs in JSON. So many times we have "nested" logs in JSON such as for Suricata logs like :

{"fileinfo":{"md5":"5a7defd65824ffe4c5a4c4614b880610","type":null,"state":"CLOSED","stored":false,"size":1732,"filename":"/wsman/subscriptions/ .....

Unfortunately these nested fields are automatically transformed to fileinfo.md5, fileinfo.type etc ... by the search engine of Splunk. So at the end, alert_manager fails to insert these fields in the KV store and finally the metadata of the created incidents are not displayed in the dashboard ....

I have no option to change my fields name because they do not contain "." originally. The "." are created by Splunk when the nested fields in JSON are flatten.

I'm pretty fan of the idea of replacing all the "." by "_" in every field name to prevent any issue with JSON data. I agree that it may be confusing but it's the only option to make sure it's gonna work all the time.