Open jimleroyer opened 1 month ago
As an ops lead, I want to know if the database is in a dire situation when an overload situation occurs, So that I can take appropriate actions.
To know when our database might be in a unhealthy situation. We assumed we had this but realized now that we do not have these.
Add regular alarms around our database health metrics such as CPU and memory usage percentage.
Stability, proactivity, awareness.
This is a good example of some alarms we could bring over from AWS using Terraform: https://github.com/cloudposse/terraform-aws-rds-cloudwatch-sns-alarms/blob/main/alarms.tf
Description
As an ops lead, I want to know if the database is in a dire situation when an overload situation occurs, So that I can take appropriate actions.
WHY are we building?
To know when our database might be in a unhealthy situation. We assumed we had this but realized now that we do not have these.
WHAT are we building?
Add regular alarms around our database health metrics such as CPU and memory usage percentage.
VALUE created by our solution
Stability, proactivity, awareness.
Acceptance Criteria
QA Steps
Additional resources
This is a good example of some alarms we could bring over from AWS using Terraform: https://github.com/cloudposse/terraform-aws-rds-cloudwatch-sns-alarms/blob/main/alarms.tf