emory-libraries / web-enhance

3 stars 0 forks source link

Create monitoring alerts #10

Open lovinscari opened 2 years ago

lovinscari commented 2 years ago

Alerts need:

Response time Errors CPU runtime / storage / memory Apdex (an amalgamated metric approximating user satisfaction) Alerts should:

Be actionable Send the permalink to the metric or a dashboard displaying all relevant metrics via runbook and/or a guide to troubleshooting Be checked mostly after codebase changes but alert us to issues we can't troubleshoot so that they can be ticketed to Acquia Alerts can:

Monitor modules/hooks Monitor SQL DB Other things that aren't directly relevant (?) but may be useful for troubleshooting A reasonable metric here is if things are 10x the time they'd normally take or 10% the quality they'd normally have or some threshold undefined for errors over 5-15 minutes period, then we should have an alert.

maxdmayhew commented 1 year ago

did we already talk about this? is this what new relec/thousand eyes? Is this a duplicate, can we close out?

@lovinscari @tmill29 @rotated8