department-of-veterans-affairs / va.gov-team

Public resources for building on and in support of VA.gov. Visit complete Knowledge Hub:
https://depo-platform-documentation.scrollhelp.site/index.html
284 stars 206 forks source link

Near No-Downtime Upgrades: Investigate Elasticache/Redis Auto Upgrade Downtime #78036

Open LindseySaari opened 8 months ago

LindseySaari commented 8 months ago

Description

We encountered an unexpected downtime of 6 minutes during an auto-upgrade process for Elasticache/Redis. Typically, auto-upgrades are triggered in response to CVE's, ensuring security related upgrades are promptly integrated. However, this particular downtime instance exceeded the expected duration and disrupted operations.

Tasks

Success Metrics

Acceptability Criteria

jennb33 commented 2 months ago

9/5/2024 update: we are moving this ticket to the Sharded/Non-Downtime objective.

flooose commented 2 months ago

I removed the reference to the POAM from this to keep us from getting confused in the future.