Description

As a developer/operator of GC Notify, I would like to be able to see the performance metrics of Kubernetes nodes within the Notify Kubernetes Cluster. So that we can better evaluate the load of GC Notify and support the scaling up/scaling out of the system.

WHY are we building?

When debugging out of memory errors or examining performance metrics for tuning, it is very useful to see the exact resource usage of nodes.

WHAT are we building?

We must install the prometheus node-exporter in Kubernetes, and configure the Amazon CW Agent for prometheus to scrape these metrics and export them to Cloudwatch.

VALUE created by our solution

We will be more aware of how Notify is behaving and have additional information at our disposal when troubleshooting issues.

Acceptance Criteria

[ ] Prometheus Node Exporter installed as deployment as code on Prod & Staging
[ ] Node metrics are available in AWS cloudwatch
[ ] A dashboard for Node metrics has been created in cloudwatch

QA Steps

[ ] Run a performance test against staging and validate that the metrics are behaving as expected
[ ] Check the dashboard, and confirm that the nodes are reporting the correct pod metrics
[ ] Check all node metrics available in AWS Cloudwatch metrics explorer

cds-snc / notification-planning-core