alertmanager / alert_manager

Splunk Alert Manager with advanced reporting on alerts, workflows (modify assignee, status, severity) and auto-resolve features
Other
82 stars 44 forks source link

Alert Manager not working in splunk 8.1 SHC Cluster #275

Open paki20 opened 3 years ago

paki20 commented 3 years ago

Hi,

I installed alert manager 3.0.4 in my SH standalone (splunk test) is working in my splunk enterprise 8.1

I installled alert manager 3.0.4 in my SHC in cluster (splunk prod) is not working in my splunk enterprise 8.1

Could you tell me why because I don't understand. Alert Manager 3.0.4 not compatible with SHC cluster?

Thanks for your help.

my2ndhead commented 3 years ago

Hi @paki20

I'm suspecting, that the alert_actions.conf is not replicated.

https://docs.splunk.com/Documentation/Splunk/8.1.0/DistSearch/HowconfrepoworksinSHC

You have to save the global settings on all SHC members manually if the file is not replicated.

Please let me know if it works out.

Thanks!

paki20 commented 3 years ago

Hi my2ndhead,

I installed alert_manager 3.0.5 but my issue is not resolve.

When I watch alert_action.conf is replicated, when I make a health check I have:

my2ndhead commented 3 years ago

Ok. it seems for some reasons the scripts that populate the kv store etc. have not been run.

Under Splunk's Settings -> Data Inptus -> Scripts you have to enable following alert_manager_migrate-vX.Y scripts:

2.0 2.1 3.0

After you have enabled the scripts do a rolling restart of the SHC.

Let me know if it helps.

paki20 commented 3 years ago

Hi my2ndhead,

I have enable scripts 2.0 / 2.1 / 3.0

And I did a rolling restart of the SHC but not work again.

Capture

my2ndhead commented 3 years ago

What does the health-check now say?

paki20 commented 3 years ago

Hi m2ndhead,

After health-check is the same result:

Built-in alert statuses deployed Failed Default Email Templates deployed Failed Default Notification Schemes Deployed Failed

my2ndhead commented 3 years ago

Can you check for any errors in $SPLUNK_HOME/var/log/alert_manager_migration.log and splunkd.log?

paki20 commented 3 years ago

Hi my2ndhead,

I hope you are fine, happy new year !

When I checked, I have:

splunkd.log 01-04-2021 10:20:59.526 +0100 INFO sendmodalert - action=alert_manager - Alert action script completed in duration=1660 ms with exit code=0 01-04-2021 10:20:59.531 +0100 INFO sendmodalert - Invoking modular alert action=alert_manager for search="[TEST-01] Test" sid="scheduler__user1__search__RMD5rs9045078zc8w2b_at_1609832000_1907_6209994-GEEL-4LKP-Z5Y9-276CAE80FA7F" in app="search" owner="user1" type="saved" 01-04-2021 10:21:01.282 +0100 INFO sendmodalert - action=alert_manager - Alert action script completed in duration=1747 ms with exit code=0 01-04-2021 10:22:35.847 +0100 ERROR ExecProcessor - message from "/opt/splunk/etc/apps/alert_manager/bin/alert_manager_scheduler.sh" /bin/sh: /opt/splunk/etc/apps/alert_manager/bin/alert_manager_scheduler.sh: Permission not granted 01-04-2021 10:22:35.865 +0100 INFO sendmodalert - action=alert_manager - Alert action script completed in duration=1489 ms with exit code=0 01-04-2021 10:22:35.869 +0100 INFO sendmodalert - Invoking modular alert action=alert_manager for search="[TEST-01] Test" sid="scheduler__user1__search__RMD5rs9045078zc8w2b_at_1609832000_1907_6209994-GEEL-4LKP-Z5Y9-276CAE80FA7F" in app="search" owner="user1" type="saved"

alert_manager.log 2021-01-04 10:20:45,010 INFO pid="23145" logger="alert_manager" message="Firing incident_created event for incident=f0f12944-b5d6-4c72-8316-3cf85db07192" (alert_manager.py:745) 2021-01-04 10:20:45,093 INFO pid="23145" logger="alert_manager" message="Alert handler finished. duration=0.676s" (alert_manager.py:791) 2021-01-04 10:20:46,100 INFO pid="23152" logger="alert_manager" message="Found job for alert '[TEST-01] Test' with title '[TEST-01] Test'. Context is 'search' with 39 results." (alert_manager.py:566) 2021-01-04 10:20:46,295 INFO pid="23152" logger="alert_manager" message="Incident status after suppresion check: new" (alert_manager.py:669) 2021-01-04 10:20:46,369 INFO pid="23152" logger="alert_manager" message="Incident initial state added to collection for job_id=scheduler__user1__search__RMD5rs9045078zc8w2b_at_1609832000_1907_6209994-GEEL-4LKP-Z5Y9-276CAE80FA7F with incident_id=f0f12944-b5d6-4c72-8316-3cf85db07192 key=8ff2ggmm0964g3098098q221" (alert_manager.py:688) 2021-01-04 10:20:46,437 INFO pid="23152" logger="alert_manager" message="Results for incident_id=f0f12944-b5d6-4c72-8316-3cf85db07192 written to collection." (alert_manager.py:706) 2021-01-04 10:20:46,448 INFO pid="23152" logger="alert_manager" message="Alert metadata written to index=alerts" (alert_manager.py:255)

paki20 commented 3 years ago

Hi,

Could you help me?