digital-land / technical-documentation

Technical Documentation for the planning data service.
https://digital-land.github.io/technical-documentation/index.html
0 stars 0 forks source link

Sentry - change settings to be more persistent if fatal error #71

Closed Ben-Hodgkiss closed 3 weeks ago

Ben-Hodgkiss commented 1 month ago

Overview In a recent incident, we were not aware the Check Service was down as the error had been logged a few weeks before (but not brought the whole Service down). This meant that we didn’t get a Sentry alert on the Slack channel to notify us of the problem.

To remedy, we should alter the Sentry notification threshold to be more persistent for errors where level=fatal. For fatal errors, we should have a more persistent alarm if the error reoccurs after its initial reporting - perhaps if it has been over 24 hours since the initial notification and the issue occurs again.

If possible, it would also be good if fatal alerts could be more prominent in Slack - so they are noticeably visually distinct from other Sentry issues (which generally are saved until the fortnightly review call for in-depth investigation).

Pull Request(PR):

Tech Approach A bullet pointed list with details on how this could be technically worked.

Resourcing & Dependencies