On Friday we had a situation where error reports to Sentry spiked up pretty significantly shortly after a deploy to vets-website. The errors didn't cause any user-facing issues in this situation, but it went undetected other than the fact that Sentry became overloaded and slow.
This type of situation likely warrants some sort of notification. We should explore a good route for this. I can think of a few angles:
1) Use metrics from the revproxy or ELB to alert via Prometheus
2) Whatever alerting capability might exist in Sentry
3) Have the deployment tooling check for this scenario in the few minutes after a deployment is considered complete.
We could also do some alerting from data from Sentry itself. The revproxy metrics only include js-report and csp-report information. There's a community exporter that may help: https://github.com/snakecharmer/sentry_exporter
There's also a Pagerduty integration that initially looked promising but https://github.com/getsentry/sentry-plugins/pull/469 leads me to believe that the integration won't work until a new release of sentry happens and that could be a long wait because they've announced that the most recent release will be the last before significant dependency changes.
Background
(copied from vets.gov-team #17566)
@wyattwalter commented on Mon Mar 25 2019
On Friday we had a situation where error reports to Sentry spiked up pretty significantly shortly after a deploy to vets-website. The errors didn't cause any user-facing issues in this situation, but it went undetected other than the fact that Sentry became overloaded and slow.
This type of situation likely warrants some sort of notification. We should explore a good route for this. I can think of a few angles:
1) Use metrics from the revproxy or ELB to alert via Prometheus 2) Whatever alerting capability might exist in Sentry 3) Have the deployment tooling check for this scenario in the few minutes after a deployment is considered complete.
@wyattwalter commented on Tue Apr 02 2019
We could also do some alerting from data from Sentry itself. The revproxy metrics only include js-report and csp-report information. There's a community exporter that may help: https://github.com/snakecharmer/sentry_exporter
@kfrz commented on Wed May 08 2019
https://github.com/department-of-veterans-affairs/vets.gov-team/issues/18052
@kfrz commented on Fri May 17 2019
Next steps:
i.e.: Change
to look like something like:
@kfrz commented on Fri May 17 2019
@wyattwalter We will likely need devops assistance in order to get the setting into Credstash and then auth the workspace. cc: @annaswims
@kfrz commented on Mon May 20 2019
From @annaswims via slack:
@annaswims commented on Wed May 22 2019
We're planning on adding a slack integration as part of https://github.com/department-of-veterans-affairs/vets.gov-team/issues/17984
There's also a Pagerduty integration that initially looked promising but https://github.com/getsentry/sentry-plugins/pull/469 leads me to believe that the integration won't work until a new release of sentry happens and that could be a long wait because they've announced that the most recent release will be the last before significant dependency changes.