CDCgov / prime-devops

Apache License 2.0
4 stars 2 forks source link

Catalog the state of SimpleReport alerts #3

Closed ronaldheft-gov closed 1 year ago

jdorothy commented 3 years ago

Currently we alert on the following AppService metrics, aggregated over a 5 min window with polling every minute ( - denotes disabled in that environment):

Metric Dev Test Demo Training Stg Prod
CPU usage >= X% - - 85 70 70 70
Memory usage >= X% - - 85 85 70 80
HTTP response time >= X ms 1000 1000 1000 1000 1000 1000
HTTP 4xx errors >= X 10 10 10 10 10 10
HTTP 5xx errors >= X 10 10 10 10 10 10
We also alert on particular strings found in AppService console logs >= X times: String Dev Test Demo Training Stg Prod
"query failed to validate" 0 0 0 0 0 0