As a developer of notify, I would like to have the minor version EKS node upgrades to be automated so that I don't have to constantly pay attention to node upgrade alerts.
WHY are we building?
Upgrading the nodes is currently a manual process that spans a couple days, and is highly dependent on my remembering to move it from staging to production, which causes configuration drift when I forget.
WHAT are we building?
Look into how SRE is triggering the slack alert for K8s node upgrades, piggy back on that logic and create a github workflow that will automatically change the node version in dev and staging.
Create a second workflow (or something better) that checks for node version discrepancies between staging and prod and automatically creates a PR that updates prod after 24 hours.
VALUE created by our solution
Increased security and reliability, with less configuration drift.
Acceptance Criteria
Given some context, when (X) action occurs, then (Y) outcome is achieved.
[ ] Generate appropriate log messages so that executions of this feature can be tracked
[ ] Can misuse of this feature cause harm? If yes, create an alert
[ ] Update the status of related findings, insights, and hypotheses on the Research Airtable
[ ] Once change/fix/feature is implemented, link relevant Airtable records to design artifacts (Figma)
Description
As a developer of notify, I would like to have the minor version EKS node upgrades to be automated so that I don't have to constantly pay attention to node upgrade alerts.
WHY are we building?
Upgrading the nodes is currently a manual process that spans a couple days, and is highly dependent on my remembering to move it from staging to production, which causes configuration drift when I forget.
WHAT are we building?
Look into how SRE is triggering the slack alert for K8s node upgrades, piggy back on that logic and create a github workflow that will automatically change the node version in dev and staging.
Create a second workflow (or something better) that checks for node version discrepancies between staging and prod and automatically creates a PR that updates prod after 24 hours.
VALUE created by our solution
Increased security and reliability, with less configuration drift.
Acceptance Criteria
Given some context, when (X) action occurs, then (Y) outcome is achieved.
[ ] Generate appropriate log messages so that executions of this feature can be tracked
[ ] Can misuse of this feature cause harm? If yes, create an alert
[ ] Update the status of related findings, insights, and hypotheses on the Research Airtable
[ ] Once change/fix/feature is implemented, link relevant Airtable records to design artifacts (Figma)
Privacy considerations
Security controls in place
Measuring success and metrics
QA Steps