alertmanager / alert_manager

Splunk Alert Manager with advanced reporting on alerts, workflows (modify assignee, status, severity) and auto-resolve features
Other
81 stars 44 forks source link

Alerts not showing in Incident Posture due to broken extractions... #173

Open jdeer0618 opened 7 years ago

jdeer0618 commented 7 years ago

155

There was not response to my last issue on the same topic but there appears to still be a problem in the latest version of AM on Splunk 6.5.2. Alert Manager does not properly extract all data in the data model. Am I the only one having this issue? Alert do not show up in the Incident Posture because these fields don't get extracted.

Seems pretty important... screenshot_102

johnfromthefuture commented 7 years ago

Are you on the latest version of AM from splunkbase or github? Just want to be sure.

I'm running 6.5.2 and I ran a similar search to what you posted in the screenshot. Of the 44,627 events found over the last 30 days, I am showing 100% coverage of those key fields.

So my initial thought: do you have the TA-alert_manager addon installed? It has the props file containing search-time extractions relative to this sourcetype.

jdeer0618 commented 7 years ago

I am running the latest off of Splunkbase, TA is on all the indexers and search heads. Alert Manager is running on the search heads.

I had the same issue with the last version and wrote a bunch of Rex evals to fix it.

Best Regards,

James A. Deer Jr. (J.D.)

205.789.6025 - Cell* 866.601.4374 - Fax

"A good plan violently executed right now is far better than a perfect plan executed next week." -- George S. Patton

On Mar 1, 2017, 13:21 -0600, johnfromthefuture notifications@github.com, wrote:

Are you on the latest version of AM from splunkbase or github? Just want to be sure.

I'm running 6.5.2 and I ran a similar search to what you posted in the screenshot. Of the 44,627 events found over the last 30 days, I am showing 100% coverage of those key fields.

So my initial thought: do you have the TA-alert_manager addon installed? It has the props file containing search-time extractions relative to this sourcetype.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub (https://github.com/simcen/alert_manager/issues/173#issuecomment-283441496), or mute the thread (https://github.com/notifications/unsubscribe-auth/AEb4qCM1mBr_FXIJtBow_nctsU-83QJoks5rhcVWgaJpZM4MQB0P).

johnfromthefuture commented 7 years ago

Thanks, that's helpful!

So this is the props.conf from the splunkbase app:

[alert_metadata]
TIME_PREFIX = "updated": "
MAX_TIMESTAMP_LOOKAHEAD = 35
SHOULD_LINEMERGE = false
FIELDALIAS-alert_metadata-app = "entry{}.acl.app" AS app
FIELDALIAS-alert_metadata-owner = "entry{}.acl.owner" AS owner
FIELDALIAS-alert_metadata-label = "entry{}.content.label" AS label
FIELDALIAS-alert_metadata-ttl = "entry{}.content.ttl" AS ttl
FIELDALIAS-alert_metadata-eventSearch = "entry{}.content.eventSearch" AS eventSearch
FIELDALIAS-alert_metadata-earliest = "entry{}.content.earliestTime" AS earliest
FIELDALIAS-alert_metadata-latest = "entry{}.content.latestTime" AS latest
FIELDALIAS-alert_metadata-severity_id = severity AS severity_id
FIELDALIAS-alert_metadata-name = "entry{}.name" AS name
TRUNCATE = 104857600

And this is what I have in my production environment:

[alert_metadata]
TIME_PREFIX = "updated": "
MAX_TIMESTAMP_LOOKAHEAD = 35
SHOULD_LINEMERGE = false
#FIELDALIAS-alert_metadata-app = "entry{}.acl.app" AS app
#FIELDALIAS-alert_metadata-owner = "entry{}.acl.owner" AS owner
FIELDALIAS-alert_metadata-label = "entry{}.content.label" AS label
#FIELDALIAS-alert_metadata-ttl = "entry{}.content.ttl" AS ttl
FIELDALIAS-alert_metadata-eventSearch = "entry{}.content.eventSearch" AS eventSearch
#FIELDALIAS-alert_metadata-earliest = "entry{}.content.earliestTime" AS earliest
#FIELDALIAS-alert_metadata-latest = "entry{}.content.latestTime" AS latest
#FIELDALIAS-alert_metadata-severity_id = severity AS severity_id
#FIELDALIAS-alert_metadata-name = "entry{}.name" AS name
EXTRACT-alertmgr01 = \"job_id\":\s+\"(?<job_id>[^\"]+)\" 
EXTRACT-alertmgr02 = \"result_id\":\s+\"(?<result_id>[^\"]+)\" 
EXTRACT-alertmgr03 = \"incident_id\":\s+\"(?<incident_id>[^\"]+)\" 
EXTRACT-alertmgr04 = \"alert\":\s+\"(?<alert>[^\"]+)\"
EXTRACT-alertmgr05 = \"app\":\s+\"(?<app>[^\"]+)\"
EXTRACT-alertmgr06 = \"earliestTime\":\s+\"(?<earliest>[^\"]+)\"
EXTRACT-alertmgr07 = \"impact\":\s+\"(?<impact>[^\"]+)\"
EXTRACT-alertmgr08 = \"latestTime\":\s+\"(?<latest>[^\"]+)\"
EXTRACT-alertmgr09 = \"alert\":\s+\"(?<name>[^\"]+)\"
EXTRACT-alertmgr10 = \"title\":\s+\"(?<title>[^\"]+)\" 
EXTRACT-alertmgr11 = \"ttl\":\s+(?<ttl>\d+)
EXTRACT-alertmgr12 = \"owner\":\s+\"(?<owner>[^\"]+)\"
EXTRACT-alertmgr13 = \"urgency\":\s+\"(?<urgency>[^\"]+)\" 
EXTRACT-alertmgr14 = \"eventSearch\":\s+\"(?<eventSearch>((\\\"|[^\"])+))\"
TRUNCATE = 0

I've last track of what comes from the dev branch verse what I've done to fix issues I've experienced in the past but recommend you drop these into your TA-alert_manager props.conf to see if it corrects the issue. If so, it should be a simple fix to include it in the next splunkbase release.

jdeer0618 commented 7 years ago

No dice... It seems like the GIANT "reportSearch" and "normalizedSearch" JSON keys might be the issue but I'm not sure. I don't understand why the getJob in alert_manager.py doesn't just keep the keys from the entry array or dict (not a python guy but I think one of those terms is correct). My JSON objects that get indexed are gigantic.

johnfromthefuture commented 7 years ago

reportSearch and normalizedSearch don't extract well for me either.

So what is happening is alert_manager.py does create the dictionary and the dictionary is sent (json.dumps) to be indexed. Here's a sample of the raw text indexed for a test alert on my test box:

{"result_id": "0", "alert": "top src in pfsense for last hour", "urgency": "medium", "entry": [{"id": "https://127.0.0.1:8089/services/search/jobs/scheduler__admin_VEEtc2VjdXJpdHk__RMD53632494b6e0b2da0_at_1488423600_18", "acl": {"modifiable": true, "perms": {"read": ["admin", "alert_manager", "security"], "write": ["admin", "alert_manager", "security"]}, "app": "TA-security", "can_write": true, "ttl": "240", "sharing": "global", "owner": "admin"}, "name": "search index=pfsense | top limit=1 src", "published": "2017-03-01T22:00:01.000-05:00", "links": {"events": "/services/search/jobs/scheduler__admin_VEEtc2VjdXJpdHk__RMD53632494b6e0b2da0_at_1488423600_18/events", "results": "/services/search/jobs/scheduler__admin_VEEtc2VjdXJpdHk__RMD53632494b6e0b2da0_at_1488423600_18/results", "summary": "/services/search/jobs/scheduler__admin_VEEtc2VjdXJpdHk__RMD53632494b6e0b2da0_at_1488423600_18/summary", "search.log": "/services/search/jobs/scheduler__admin_VEEtc2VjdXJpdHk__RMD53632494b6e0b2da0_at_1488423600_18/search.log", "timeline": "/services/search/jobs/scheduler__admin_VEEtc2VjdXJpdHk__RMD53632494b6e0b2da0_at_1488423600_18/timeline", "results_preview": "/services/search/jobs/scheduler__admin_VEEtc2VjdXJpdHk__RMD53632494b6e0b2da0_at_1488423600_18/results_preview", "alternate": "/services/search/jobs/scheduler__admin_VEEtc2VjdXJpdHk__RMD53632494b6e0b2da0_at_1488423600_18", "control": "/services/search/jobs/scheduler__admin_VEEtc2VjdXJpdHk__RMD53632494b6e0b2da0_at_1488423600_18/control"}, "content": {"doneProgress": 1, "dropCount": 0, "normalizedSearch": "litsearch index=pfsense | addinfo type=count label=prereport_events | fields keepcolorder=t \"cvp_reserved_count\" \"src\" | pretop 1 src", "pid": "30312", "searchProviders": ["pluto"], "sid": "scheduler__admin_VEEtc2VjdXJpdHk__RMD53632494b6e0b2da0_at_1488423600_18", "isZombie": false, "remoteSearch": "litsearch index=pfsense | addinfo  type=count label=prereport_events | fields  keepcolorder=t \"cvp_reserved_count\" \"src\" | pretop  1 src", "diskUsage": 114688, "keywords": "index::pfsense", "isSavedSearch": true, "isTimeCursored": true, "searchLatestTime": 1488423600, "searchEarliestTime": 1488420000, "resultCount": 1, "defaultTTL": "600", "ttl": 240, "reportSearch": "top  limit=1 src", "optimizedSearch": "| search index=pfsense | top limit=1 src", "eventFieldCount": 0, "scanCount": 4649, "canSummarize": true, "resultPreviewCount": 1, "eventCount": 4649, "dispatchState": "DONE", "fieldMetadataResults": {"percent": {"type": "unknown", "type_special": "percent"}, "count": {"type": "unknown", "type_special": "count"}}, "isDone": true, "savedSearchLabel": "{\"owner\":\"admin\",\"app\":\"TA-security\",\"sharing\":\"global\"}", "eventIsStreaming": true, "earliestTime": "2017-03-01T21:00:00.000-05:00", "delegate": "scheduler", "isPaused": false, "searchTotalBucketsCount": 3, "isFinalized": false, "latestTime": "2017-03-01T22:00:00.000-05:00", "eventIsTruncated": true, "fieldMetadataStatic": {"percent": {"type": "unknown", "type_special": "percent"}, "count": {"type": "unknown", "type_special": "count"}}, "isSaved": false, "performance": {"command.search.rawdata": {"duration_secs": 0.013, "invocations": 3}, "command.search.index": {"duration_secs": 0.005, "invocations": 3}, "command.top.execute_input": {"duration_secs": 0.005, "invocations": 5, "output_count": 0, "input_count": 161}, "dispatch.optimize.FinalEval": {"duration_secs": 0.045, "invocations": 1}, "dispatch.check_disk_usage": {"duration_secs": 0.001, "invocations": 1}, "dispatch.fetch": {"duration_secs": 0.14, "invocations": 5}, "command.search.summary": {"duration_secs": 0.001, "invocations": 4}, "command.fields": {"duration_secs": 0.002, "invocations": 4, "output_count": 4649, "input_count": 4649}, "command.search.tags": {"duration_secs": 0.003, "invocations": 3, "output_count": 4649, "input_count": 4649}, "command.search.fieldalias": {"duration_secs": 0.006, "invocations": 3, "output_count": 4649, "input_count": 4649}, "command.search.kv": {"duration_secs": 0.064, "invocations": 3}, "dispatch.evaluate": {"duration_secs": 0.044, "invocations": 1}, "command.search.lookups": {"duration_secs": 0.001, "invocations": 3, "output_count": 4649, "input_count": 4649}, "dispatch.writeStatus": {"duration_secs": 0.008, "invocations": 7}, "command.search.expand_search": {"duration_secs": 0.025, "invocations": 1}, "command.search.batch.sort": {"duration_secs": 0.002, "invocations": 2}, "command.addinfo": {"duration_secs": 0.003, "invocations": 4, "output_count": 4649, "input_count": 4649}, "dispatch.optimize.toSpl": {"duration_secs": 0.001, "invocations": 1}, "command.search.index.usec_1_8": {"invocations": 2}, "command.top.execute_output": {"duration_secs": 0.001, "invocations": 1, "output_count": 0, "input_count": 0}, "command.search.index.usec_8_64": {"invocations": 1}, "command.search": {"duration_secs": 0.085, "invocations": 4, "output_count": 4649, "input_count": 0}, "dispatch.optimize.optimization": {"duration_secs": 0.001, "invocations": 1}, "command.top": {"duration_secs": 0.006, "invocations": 6, "output_count": 0, "input_count": 161}, "dispatch.evaluate.search": {"duration_secs": 0.044, "invocations": 1}, "command.search.batch.cache_setup": {"duration_secs": 0.004, "invocations": 2}, "command.search.typer": {"duration_secs": 0.001, "invocations": 3, "output_count": 4649, "input_count": 4649}, "dispatch.optimize.toJson": {"duration_secs": 0.001, "invocations": 1}, "startup.configuration": {"duration_secs": 0.02, "invocations": 1}, "dispatch.localSearch": {"duration_secs": 0.09, "invocations": 1}, "dispatch.optimize.matchReportAcceleration": {"duration_secs": 0.132, "invocations": 1}, "command.search.calcfields": {"duration_secs": 0.002, "invocations": 3, "output_count": 4649, "input_count": 4649}, "startup.handoff": {"duration_secs": 0.048, "invocations": 1}, "dispatch.evaluate.top": {"duration_secs": 0.001, "invocations": 1}, "dispatch.stream.local": {"duration_secs": 0.091, "invocations": 4}, "command.pretop": {"duration_secs": 0.009, "invocations": 4, "output_count": 161, "input_count": 4649}, "dispatch.createdSearchResultInfrastructure": {"duration_secs": 0.002, "invocations": 1}, "dispatch.optimize.reparse": {"duration_secs": 0.001, "invocations": 1}}, "runDuration": 0.371, "statusBuckets": 0, "eventSearch": "search index=pfsense ", "isBatchModeSearch": true, "numPreviews": 0, "isPreviewEnabled": false, "isEventsPreviewEnabled": false, "sampleSeed": "0", "cursorTime": "1969-12-31T19:00:00.000-05:00", "request": {"ui_dispatch_view": "search", "earliest_time": "-60m@m", "indexedRealtimeMinSpan": "", "ui_dispatch_app": "TA-security", "time_format": "%FT%T.%Q%:z", "buckets": "0", "latest_time": "@m", "sample_ratio": "1", "max_time": "0", "indexedRealtime": "", "lookups": "1", "auto_cancel": "0", "reduce_freq": "10", "max_count": "500000", "rt_maximum_span": "", "auto_pause": "0", "rt_backfill": "0", "index_earliest": "", "indexedRealtimeOffset": "", "spawn_process": "1", "index_latest": ""}, "searchTotalEliminatedBucketsCount": 0, "isFailed": false, "searchCanBeEventType": false, "sampleRatio": "1", "fieldMetadataEvents": {"percent": {"type": "unknown", "type_special": "percent"}, "count": {"type": "unknown", "type_special": "count"}}, "eventSorting": "none", "label": "top src in pfsense for last hour", "messages": [], "isGoodSummarizationCandidate": true, "defaultSaveTTL": "604800", "eventAvailableCount": 0, "resultIsStreaming": false, "isRemoteTimeline": false, "reduceSearch": "sitop  limit=1 src", "priority": 5, "isRealTimeSearch": false}, "author": "admin", "updated": "2017-03-01T22:00:01.688-05:00"}], "owner": "unassigned", "title": "top src in pfsense for last hour", "impact": "medium", "job_id": "scheduler__admin_VEEtc2VjdXJpdHk__RMD53632494b6e0b2da0_at_1488423600_18", "app": "TA-security", "name": "top src in pfsense for last hour", "alert_time": "2017-03-01T22:00:01.000-05:00", "incident_id": "09ecc7ba-17a6-47c1-a4e5-60e339b5f40d", "ttl": 86400, "priority": "medium"}

Splunk typically recognizes this as JSON formatting. It will automatically extract key pairs based on the "key": "value" construct and give you a pretty display when searching the raw data. This is probably why the TA-alert_manager does not have extractions out of the box because it shouldn't be necessary. I've found that it breaks down a bit when the alert searches get complex which is why I introduced the extractions above. It still doesn't work 100% for the eventSearch and normalizedSearch fields - I just need to go back and make a better regex.

So, question: Do all the other fields work (not the normalizedSearch and reportSearch) after adding the above extractions on your search head? The only lost functionality at that point is the ability to click the magnifying glass in the Incident Posture page...

jdeer0618 commented 7 years ago

I get that Splunk will pretty print JSON but it still seems necessary to return every single field back from the jobs endpoint. Also, it looks like the keys don't return in the same order from the json.dumps or it could just look that way. It is hard to tell between the large amount of data returned and the browser choking up as well because of event size.

Doing some Googleing it looks like setting sort_keys=True for the dumps might keep the raw data in order if the order is in fact not set when returned.

So this is what I have so far and seems to be working, though it has not been thoroughly tested. index=alerts eventtype="alert_metadata" | rex "\"eventSearch\":\s+\"(?((\\\"|[^\"])+))\"" | rex "\"name\":\s+\"(?(?:\s?search\s?|\s?[|]\s?)(?:\\\"|[^\"])*?)\"" | rex field=search mode=sed "s/\\n/ /g" | rex "\"urgency\":\s+\"(?[^\"]+)\"" | rex "\"alert_time\":\s+\"(?[^\"]+)\"" | rex "\"owner\":\s+\"(?[^\"]+)\"" | rex "\"ttl\":\s+(?\d+)" | rex "\"title\":\s+\"(?[^\"]+)\"" | rex "\"alert\":\s+\"(?<name>[^\"]+)\"" | rex "\"latestTime\":\s+\"(?<latest>[^\"]+)\"" | rex "\"impact\":\s+\"(?<impact>[^\"]+)\"" | rex "\"earliestTime\":\s+\"(?<earliest>[^\"]+)\"" | rex "\"app\":\s+\"(?<app>[^\"]+)\"" | rex "\"alert\":\s+\"(?<alert>[^\"]+)\"" | rex "\"incident_id\":\s+\"(?<incident_id>[^\"]+)\"" | rex "\"result_id\":\s+\"(?<result_id>[^\"]+)\"" | rex "\"job_id\":\s+\"(?<job_id>[^\"]+)\"" | fillnull value="broken" | stats values(alert) as alert, values(title) as title, values(app) as app, values(eventSearch) as event_search, values(search) as search, values(impact) as impact, values(earliest) as earliest, values(latest) as latest, count by job_id, incident_id, result_id, _time | sort 0 - _time | lookup incidents incident_id OUTPUTNEW alert, title, owner, status, impact, urgency | eval title=if(isnull(title) OR title="",alert,title) | eval search = coalesce(search, event_search) | foreach search eventSearch [eval <<FIELD>> = ltrim(<<FIELD>>, "search ")] | lookup alert_priority impact, urgency OUTPUT priority | lookup incident_settings alert OUTPUT category, subcategory, tags, display_fields | lookup alert_status status OUTPUT status_description | fillnull value="" tags, category, subcategory | eval tags=if(tags=="","[Untagged]",tags) | makemv delim=" " tags </p> <p>I am sure there are A LOT of optimizations that can be done but I need to get the SOC off my back for now. :)</p> </div> </div> <div class="comment"> <div class="user"> <a rel="noreferrer nofollow" target="_blank" href="https://github.com/jdeer0618"><img src="https://avatars.githubusercontent.com/u/4651176?v=4" />jdeer0618</a> commented <strong> 7 years ago</strong> </div> <div class="markdown-body"> <p>Found an issue with <code>result_id</code>. The following results in no results being found, which is what I want, over 30 days.</p> <pre><code> index=alerts eventtype="alert_metadata" | rex "\"eventSearch\":\s+\"(?<eventSearch>((\\\\\"|[^\"])+))\"" | rex "\"name\":\s+\"(?<search>(?:\s*?search\s*?|\s*?[\|]\s*?)(?:\\\\\"|[^\"])*?)\"" | rex field=search mode=sed "s/\\\n/ /g" | rex "\"urgency\":\s+\"(?<urgency>[^\"]+)\"" | rex "\"alert_time\":\s+\"(?<alert_time>[^\"]+)\"" | rex "\"owner\":\s+\"(?<owner>[^\"]+)\"" | rex "\"ttl\":\s+(?<ttl>\d+)" | rex "\"title\":\s+\"(?<title>[^\"]+)\"" | rex "\"alert\":\s+\"(?<name>[^\"]+)\"" | rex "\"latestTime\":\s+\"(?<latest>[^\"]+)\"" | rex "\"impact\":\s+\"(?<impact>[^\"]+)\"" | rex "\"earliestTime\":\s+\"(?<earliest>[^\"]+)\"" | rex "\"app\":\s+\"(?<app>[^\"]+)\"" | rex "\"alert\":\s+\"(?<alert>[^\"]+)\"" | rex "\"incident_id\":\s+\"(?<incident_id>[^\"]+)\"" | rex "\"result_id\":\s+(?<result_id>[^,]+)" | rex "\"job_id\":\s+\"(?<job_id>[^\"]+)\"" | fillnull value="broken" | fieldsummary | search values = broken</code></pre> </div> </div> <div class="comment"> <div class="user"> <a rel="noreferrer nofollow" target="_blank" href="https://github.com/johnfromthefuture"><img src="https://avatars.githubusercontent.com/u/19210786?v=4" />johnfromthefuture</a> commented <strong> 7 years ago</strong> </div> <div class="markdown-body"> <p>I'll have to do some testing on this portion of the app to see if the unnecessary crud can be cut out of the metadata, sort_keys added in, and the event indexing made a bit more consistent. That'll take me some time since I'll need to get it into my production environment to see how it works with real data. Speaking from the SOC perspective, most of the information in that metadata is unnecessary anyways.</p> <p>I'll get back to you!</p> </div> </div> <div class="comment"> <div class="user"> <a rel="noreferrer nofollow" target="_blank" href="https://github.com/johnfromthefuture"><img src="https://avatars.githubusercontent.com/u/19210786?v=4" />johnfromthefuture</a> commented <strong> 7 years ago</strong> </div> <div class="markdown-body"> <p>If you're the brave type that doesn't mind just tossing stuff in a prod environment, I replaced line 385 in alert_manager.py with this:</p> <pre><code> # metadata.update({ 'entry': [ job ] }) # The goal here is to reduce event size and limit the job data down to the fields we # absolutely want/care about making them easier to handle later. # For backwards compat purposes, I want to keep the data structure the same. job_data = {} job_data['content'] = { 'searchEarliestTime': job['content']['searchEarliestTime'], 'searchLatestTime': job['content']['searchLatestTime'], 'earliestTime': job['content']['earliestTime'], 'latestTime': job['content']['latestTime'], 'eventCount': job['content']['eventCount'], 'keywords': job['content']['keywords'], 'messages': job['content']['messages'], 'resultCount': job['content']['resultCount'], 'searchProviders': job['content']['searchProviders'], 'eventSearch': job['content']['eventSearch'], 'optimizedSearch': job['content']['optimizedSearch'] } job_data['links'] = { 'alternate': job['links']['alternate'] } job_data['name'] = job['name'] # Not sure why this is stored as a list but later references expect it, so I will leave it this way metadata.update({ 'entry': [ job_data ] }) ####</code></pre> <p>It should reduce the information being indexed but keep the same expected structure for backwards compatibility. I chose those fields as the ones I'd care most about in $dayjob. I need to let it bake and do some testing before I can throw real data at it...</p> </div> </div> <div class="comment"> <div class="user"> <a rel="noreferrer nofollow" target="_blank" href="https://github.com/jdeer0618"><img src="https://avatars.githubusercontent.com/u/4651176?v=4" />jdeer0618</a> commented <strong> 7 years ago</strong> </div> <div class="markdown-body"> <p>Thanks! I'll drop it on our dev machine and see what I get back.</p> </div> </div> <div class="comment"> <div class="user"> <a rel="noreferrer nofollow" target="_blank" href="https://github.com/johnfromthefuture"><img src="https://avatars.githubusercontent.com/u/19210786?v=4" />johnfromthefuture</a> commented <strong> 7 years ago</strong> </div> <div class="markdown-body"> <p>So I shoved that change into my prod alert manager on Friday afternoon. Since then, relying on Splunk spath type extractions for JSON shows about 85% success for getting the entry{}.name (original search) field. This led me to looking at the normalizedSearch field. What I noticed here is that the average length of this field on working extractions was 946 characters (p90 around 1713, max at 2281). On the other hand, when the entry{}.name field did not exist, the avg/p90/max was 8248/8315/75416.</p> <p>While it wasn't universally true that a longer normalizedSearch field broke the extractions, it did seem to be this way in most cases. I'm updating my test to remove the normalizedSearch field - I wouldn't use this search in practice anyways.</p> </div> </div> <div class="comment"> <div class="user"> <a rel="noreferrer nofollow" target="_blank" href="https://github.com/jdeer0618"><img src="https://avatars.githubusercontent.com/u/4651176?v=4" />jdeer0618</a> commented <strong> 7 years ago</strong> </div> <div class="markdown-body"> <p>Nice! I have not had a chance to throw the change in yet. I'd agree about the normalizedSearch field and not using it. </p> </div> </div> <div class="comment"> <div class="user"> <a rel="noreferrer nofollow" target="_blank" href="https://github.com/johnfromthefuture"><img src="https://avatars.githubusercontent.com/u/19210786?v=4" />johnfromthefuture</a> commented <strong> 7 years ago</strong> </div> <div class="markdown-body"> <p>So I've been on vacation and working a bunch but this has been running without issue in my production instance for the last couple weeks. Doing a fast mode search with <code>index=alerts sourcetype=alert_metadata | spath</code> over 14 days showed the entry{}.name field successfully extracting 95.599% out of 23,972 entries. I'd say that's pretty good. The few edge cases can probably be handled with a few custom made search-time extractions.</p> <p>Are you good to close out this issue?</p> </div> </div> <div class="comment"> <div class="user"> <a rel="noreferrer nofollow" target="_blank" href="https://github.com/johnfromthefuture"><img src="https://avatars.githubusercontent.com/u/19210786?v=4" />johnfromthefuture</a> commented <strong> 7 years ago</strong> </div> <div class="markdown-body"> <p>As an aside, I removed the extractions I posted above and reverted back to using field aliases so that's nice.</p> </div> </div> <div class="comment"> <div class="user"> <a rel="noreferrer nofollow" target="_blank" href="https://github.com/johnfromthefuture"><img src="https://avatars.githubusercontent.com/u/19210786?v=4" />johnfromthefuture</a> commented <strong> 7 years ago</strong> </div> <div class="markdown-body"> <p>So after running these changes in my prod environment for a while, I made a couple changes. First, I removed the <code>keywords</code> field from the metadata creation in alert_manager.py. Looks like this now:</p> <pre><code>job_data['content'] = { 'searchEarliestTime': job['content']['searchEarliestTime'], 'searchLatestTime': job['content']['searchLatestTime'], 'earliestTime': job['content']['earliestTime'], 'latestTime': job['content']['latestTime'], 'eventCount': job['content']['eventCount'], 'messages': job['content']['messages'], 'resultCount': job['content']['resultCount'], 'searchProviders': job['content']['searchProviders'], 'eventSearch': job['content']['eventSearch'], 'optimizedSearch': job['content']['optimizedSearch'] }</code></pre> <p>In the TA-alert_manager, props.conf, I ended up adding these extractions back into my alert_metadata:</p> <pre><code>EXTRACT-alertmgr01 = \"job_id\":\s+\"(?<job_id>[^\"]+)\" EXTRACT-alertmgr02 = \"result_id\":\s+\"(?<result_id>[^\"]+)\" EXTRACT-alertmgr03 = \"incident_id\":\s+\"(?<incident_id>[^\"]+)\" EXTRACT-alertmgr04 = \"alert\":\s+\"(?<alert>[^\"]+)\" EXTRACT-alertmgr05 = \"app\":\s+\"(?<app>[^\"]+)\" EXTRACT-alertmgr07 = \"impact\":\s+\"(?<impact>[^\"]+)\" EXTRACT-alertmgr09 = \"alert\":\s+\"(?<name>[^\"]+)\" EXTRACT-alertmgr10 = \"title\":\s+\"(?<title>[^\"]+)\" EXTRACT-alertmgr12 = \"owner\":\s+\"(?<owner>[^\"]+)\" EXTRACT-alertmgr13 = \"urgency\":\s+\"(?<urgency>[^\"]+)\" </code></pre> <p>When the JSON <code>spath</code> based extractions break, you need at least the job_id, result_id, and incident_id to be extracted in order for the alert to show up in incident posture (due to the by clause on the tstats search). So I added the extractions above for consistency purposes and they should work as long as the fields in question don't have quotes in them.</p> </div> </div> <div class="comment"> <div class="user"> <a rel="noreferrer nofollow" target="_blank" href="https://github.com/jdeer0618"><img src="https://avatars.githubusercontent.com/u/4651176?v=4" />jdeer0618</a> commented <strong> 7 years ago</strong> </div> <div class="markdown-body"> <p>John, thanks for the work on this. I am looking at this now. Been covered up the last couple weeks.</p> </div> </div> <div class="comment"> <div class="user"> <a rel="noreferrer nofollow" target="_blank" href="https://github.com/ozanpasa"><img src="https://avatars.githubusercontent.com/u/26657500?v=4" />ozanpasa</a> commented <strong> 7 years ago</strong> </div> <div class="markdown-body"> <p>Currently Running Splunk v 6.5.2 and Alert Manager v 2.1.4</p> <p>I have also run into this same issue. I read the original post but didn't think it applied to my issue, but apparently it did. Here is what led me down the same path. The issue I was having was that the "app" field was not being populated in a subset of the events in the Incident Posture dashboard. </p> <p>I started by looking in incident_posture.xml which led me all the way to the alerts index. When I started doing searches I had the same issue as detailed above, only about 80% of the fields were being parsed out including the incident_id, job_id and result_id fields which are what the "tstats" command in the "all_alerts" macro needs to display each info. Long story short I added the following code to my props.conf which now parses out 100% of the incident_id, job_id and results_id fields and about 99% of the other fields like "eventSearch", "earliest" and "latest" fields which are necessary for re-running the search by clicking on the search icon. </p> <pre><code>[alert_metadata] #Needed to parse JSON data for Alert Manager KV_MODE = json #To prevent LONG json events from being truncated due to normalized search TRUNCATE = 0 EXTRACT-alert_meta_incidentid = \"incident_id\"\s*:\s*\"(?<incident_id>.*?)\" EXTRACT-alert_meta_resultid = \"result_id\"\s*:\s*\"(?<result_id>.*?)\" EXTRACT-alert_meta_jobid = \"job_id\"\s*:\s*\"(?<job_id>.*?)\" EXTRACT-alert_meta_alert = \"alert\"\s*:\s*\"(?<alert>.*?)\" EXTRACT-alert_meta_title = \"title\"\s*:\s*\"(?<title>.*?)\" EXTRACT-alert_meta_app = \"app\"\s*:\s*\"(?<app>.*?)\"</code></pre> <p>I would be interested to see what adding <code>kv_mode=json</code> does in your code as well. </p> <p>Also, in troubleshooting all of this I did also come to the realization that it would be beneficial to remove some of the excess data in the json event in the alerts index. However, one thing that I would like to see added to the event in the alerts index is the actual result values in the log so that somehow we can utilize the freeform search on the values that are in the event that triggers such as src_ip, dest_ip..... This would be HUGE. </p> </div> </div> <div class="comment"> <div class="user"> <a rel="noreferrer nofollow" target="_blank" href="https://github.com/johnfromthefuture"><img src="https://avatars.githubusercontent.com/u/19210786?v=4" />johnfromthefuture</a> commented <strong> 7 years ago</strong> </div> <div class="markdown-body"> <p>I have a different issue open for the indexing of results. This was a requirement for me as well and have checked it into the dev branch since becoming a contributor. Feel free to check it out.</p> <p>On Tue, Apr 4, 2017 at 6:54 PM ozanpasa <a href="mailto:notifications@github.com">notifications@github.com</a> wrote:</p> <blockquote> <p>Currently Running Splunk v 6.5.2 and Alert Manager v 2.1.4</p> <p>I have also run into this same issue. I read the original post but didn't think it applied to my issue, but apparently it did. Here is what led me down the same path. The issue I was having was that the "app" field was not being populated in a subset of the events in the Incident Posture dashboard.</p> <p>I started by looking in incident_posture.xml which led me all the way to the alerts index. When I started doing searches I had the same issue as detailed above, only about 80% of the fields were being parsed out including the incident_id, job_id and result_id fields which are what the "tstats" command in the "all_alerts" macro needs to display each info. Long story short I added the following code to my props.conf which now parses out 100% of the incident_id, job_id and results_id fields and about 99% of the other fields like "eventSearch", "earliest" and "latest" fields which are necessary for re-running the search by clicking on the search icon.</p> <p>[alert_metadata]</p> <h1>Needed to parse JSON data for Alert Manager</h1> <p>KV_MODE = json</p> <h1>To prevent LONG json events from being truncated due to normalized search</h1> <p>TRUNCATE = 0</p> <p>EXTRACT-alert_meta_incidentid = \"incident_id\"\s<em>:\s</em>\"(?<incident_id>.<em>?)\" EXTRACT-alert_meta_resultid = \"result_id\"\s</em>:\s<em>\"(?<result_id>.</em>?)\" EXTRACT-alert_meta_jobid = \"job_id\"\s<em>:\s</em>\"(?<job_id>.<em>?)\" EXTRACT-alert_meta_alert = \"alert\"\s</em>:\s<em>\"(?<alert>.</em>?)\" EXTRACT-alert_meta_title = \"title\"\s<em>:\s</em>\"(?<title>.<em>?)\" EXTRACT-alert_meta_app = \"app\"\s</em>:\s<em>\"(?<app>.</em>?)\"</p> <p>I would be interested to see what adding kv_mode=json does in your code as well.</p> <p>Also, in troubleshooting all of this I did also come to the realization that it would be beneficial to remove some of the excess data in the json event in the alerts index. However, one thing that I would like to see added to the event in the alerts index is the actual result values in the log so that somehow we can utilize the freeform search on the values that are in the event that triggers such as src_ip, dest_ip..... This would be HUGE.</p> <p>— You are receiving this because you commented.</p> <p>Reply to this email directly, view it on GitHub <a href="https://github.com/simcen/alert_manager/issues/173#issuecomment-291663518">https://github.com/simcen/alert_manager/issues/173#issuecomment-291663518</a>, or mute the thread <a href="https://github.com/notifications/unsubscribe-auth/ASUiIiINqwFyMyYL-p2VLjaOQS3lzxjAks5rssodgaJpZM4MQB0P">https://github.com/notifications/unsubscribe-auth/ASUiIiINqwFyMyYL-p2VLjaOQS3lzxjAks5rssodgaJpZM4MQB0P</a> .</p> <p>-- -- Sent from a mobile device</p> </blockquote> </div> </div> <div class="comment"> <div class="user"> <a rel="noreferrer nofollow" target="_blank" href="https://github.com/jdeer0618"><img src="https://avatars.githubusercontent.com/u/4651176?v=4" />jdeer0618</a> commented <strong> 7 years ago</strong> </div> <div class="markdown-body"> <p>Was just working on this... what is the flow of repos? devolop > release/2.X > master</p> <p>I wasn't sure which one to pull and use.</p> </div> </div> <div class="comment"> <div class="user"> <a rel="noreferrer nofollow" target="_blank" href="https://github.com/johnfromthefuture"><img src="https://avatars.githubusercontent.com/u/19210786?v=4" />johnfromthefuture</a> commented <strong> 7 years ago</strong> </div> <div class="markdown-body"> <p>They aren't perfectly lined up. I've been adding my stuff to the develop branch though. I have plans to make sure it's all in the 2.2 branch as well but I'm not caught up yet. On Tue, Apr 4, 2017 at 7:06 PM jdeer0618 <a href="mailto:notifications@github.com">notifications@github.com</a> wrote:</p> <blockquote> <p>Was just working on this... what is the flow of repos? devolop > release/2.X > master</p> <p>I wasn't sure which one to pull and use.</p> <p>— You are receiving this because you commented.</p> <p>Reply to this email directly, view it on GitHub <a href="https://github.com/simcen/alert_manager/issues/173#issuecomment-291666613">https://github.com/simcen/alert_manager/issues/173#issuecomment-291666613</a>, or mute the thread <a href="https://github.com/notifications/unsubscribe-auth/ASUiIkGf5_ioJcjQAQMyqsEdVXs1FcDEks5rsszdgaJpZM4MQB0P">https://github.com/notifications/unsubscribe-auth/ASUiIkGf5_ioJcjQAQMyqsEdVXs1FcDEks5rsszdgaJpZM4MQB0P</a> .</p> <p>-- -- Sent from a mobile device</p> </blockquote> </div> </div> <div class="comment"> <div class="user"> <a rel="noreferrer nofollow" target="_blank" href="https://github.com/johnfromthefuture"><img src="https://avatars.githubusercontent.com/u/19210786?v=4" />johnfromthefuture</a> commented <strong> 7 years ago</strong> </div> <div class="markdown-body"> <p>Coming back around to this... As of today, here's what I know of the alert manager repositories...</p> <p>Version 2.1.4 is the main release available on Splunkbase. The development (develop) branch is 2.1.4 + bug fixes and enhancements. The 2.2 branch is the next major release which should have anything from 2.1.4, develop branch, and any future enhancements that are targeted for that release. I haven't used the 2.2 branch or looked at it closely. I have a few enhancements that I put in the develop branch and have been meaning to push up to the 2.2 branch as well. </p> <p>As for my usage of Alert Manager - I use the 2.1.4 branch in my production instance and put any enhancements I make into that branch. I run a SOC based on that branch of the code and have made a number of customizations to meet my use cases. Given my heavy usage of that branch, it's what I would recommend using today.</p> </div> </div> <div class="page-bar-simple"> </div> <div class="footer"> <ul class="body"> <li>© <script> document.write(new Date().getFullYear()) </script> Githubissues.</li> <li>Githubissues is a development platform for aggregating issues.</li> </ul> </div> <script src="https://cdn.jsdelivr.net/npm/jquery@3.5.1/dist/jquery.min.js"></script> <script src="/githubissues/assets/js.js"></script> <script src="/githubissues/assets/markdown.js"></script> <script src="https://cdn.jsdelivr.net/gh/highlightjs/cdn-release@11.4.0/build/highlight.min.js"></script> <script src="https://cdn.jsdelivr.net/gh/highlightjs/cdn-release@11.4.0/build/languages/go.min.js"></script> <script> hljs.highlightAll(); </script> </body> </html>