alertmanager / alert_manager

Splunk Alert Manager with advanced reporting on alerts, workflows (modify assignee, status, severity) and auto-resolve features
Other
82 stars 44 forks source link

Unable to assign ownership to users #211

Closed elf32 closed 6 years ago

elf32 commented 6 years ago

image When editing an incident, under the Incident Workflow the 'Owner:' dropdown is greyed out. The user's populated under Settings>Users Settings are showing as type 'builtin' and the active user directory is set to 'both'. Previously we were on version 2.1.4; which we utilized the roles: alert_manager, alert_manager_user, alert_manager_supervisor, and alert_manager_admin. However, now that we have updated to the version 2.2.2, these roles are no longer a part of the authorize.conf file, and we cannot assign ownership of any alerts that come in. We are not sure how the roles and permissions need to be configured for users to be able to assign the incidents. The roles were removed as part of certification in the following commit: https://github.com/simcen/alert_manager/commit/def0df2d60a45807d9ff8e43318b43e7c5ad8ed7 Splunk: 7.0.2, Alert Manager: 2.2.2, Alert Manager Add-On: 2.1.1

simcen commented 6 years ago

Unfortunately we had to change the permission system. Prior to v2.2.2, it was required to add the "alert_manager" capability to a user. Starting from 2.2.2, we had to change this as providing capabilities is not allowed anymore. Instead, a role "alert_manager" or "alert_manager_user" has to be assigned to the user or inherited from another role. Sorry about that.

simcen commented 6 years ago

Btw, I just updated the Update Manual (http://docs.alertmanager.info/en/latest/update_manual/) for your convenience. Let me know if that helps.

elf32 commented 6 years ago

Thank you, the documentation seems to be more clear and aligns with the updated version. We did assign the alert_manager_user and alert_manager roles to one of the users, but not able to assign any alerts. alert_manager_perms The user had both the alert_manager_user and alert_manager role after the upgrade and they were unable to assign the incidents. We even performed a complete reinstall of the app, removing the previous permissions and leaving only those two permissions without any success.

jndizzy commented 6 years ago

We have a distributed environment (with an F5 vip) using SAML auth. Using the Developers tool to look at the Network traffic, it looks like I am receiving a 404 error because it is reaching out to a different URL for some odd reason. This seems to only happen on Chrome since it seems to work in IE.

Below is the transaction from opening up the 'Edit Incident' pop up where we see the greyed out owner field.

200 SUCCESS when running through our SAML distributed environment in IE 11, OR when running on a stand alone SH instead of going through our vip. Request URL: https://myservername:8000/en-US/splunkd/__raw/services/alert_manager/helpers?action=get_users&_=1528305822891

404 FAIL through our SAML distributed environment in Chrome. Request URL: https://myvipname.com/en-US/custom/alert_manager/helpers/get_users?_=1528305107720

elf32 commented 6 years ago

I revisited this issue recently and seemed to have found where our migration to 2.2.2 fell short. Thank you @simcen and @jndizzy for your help and information to get this resolved.

We were previously on 2.1.4, during the upgrade process would have leveraged the v2.1 migration script in order to complete the upgrade process.

I found that when the modal in the incident posture UI is loaded to edit the incident, two calls are made /splunkd/__raw/services/alert_manager/alert_status?action=get_users and /splunkd/__raw/services/alert_manager/alert_status?action=get_alert_status for the owner and status dropdowns, respectively. A look into the browser's debugger shows that the doedit is caught and the function is returned without any errors; a query of the splunkd_ui_access logs confirms that both are completed with 200 OK responses. When I visited this URI directly from the browser, https://<your-splunk-instance>/splunkd/__raw/services/alert_manager/alert_status?action=get_alert_status, the output was an empty blob.

This meant the alert_status collection existed, but was empty. We confirmed this by inspecting the collections via the Lookup Editor app, and it was in fact empty. After some searching I happened on the migration script for v2.2, where the default_status.json file was read in and populated the collection with a batch_save via the REST API. We attempted to run the v2.2 shell and python scripts manually, and for some reason were not found by Splunk. In spite both of them existing on the filesystem.

So, to make a long story longer, we ran the following REST command as a test to see if it populates the information manually. curl -k -u rest_username:rest_passwd https://<your-splunk-instance>:8089/servicesNS/nobody/alert_manager/storage/collections/data/alert_status -H "Content-Type: application/json" -d '{ "status": "new", "status_description": "New", "internal_only": 0, "builtin" : 1 }' Splunk docs for REST collections syntax Optionally, we could have finagled the json to use the batch_save command.

The alert status is populating correctly, so I'll be closing the issue.