cds-snc / sre-bot

Slack bot for site reliability engineering
MIT License
6 stars 1 forks source link

Create Cloudwatch alert if scheduler is broken #350

Open maxneuvians opened 11 months ago

maxneuvians commented 11 months ago

We have a scheduler that create a log every 5 minutes to make sure that it is still alive.

https://github.com/cds-snc/sre-bot/blob/main/app/jobs/scheduled_tasks.py#L19-L23

We should create an alert that makes sure this shows up every 5 minutes, otherwise it is probably dead.

gcharest commented 1 month ago

Reverted the change as the current pricing model of the cloudwatch alarm would be about 1k per month... 😅