We encountered an unexpected downtime of 6 minutes during an auto-upgrade process for Elasticache/Redis. Typically, auto-upgrades are triggered in response to CVE's, ensuring security related upgrades are promptly integrated. However, this particular downtime instance exceeded the expected duration and disrupted operations.
Tasks
[ ] Research Downtime Cause. Can we incorporate an alternative feature set to obtain minimal write downtime during auto upgrades?
[ ] AWS Documentation Analysis: Assess whether our current setup aligns with AWS documentation
[ ] Upgrade/Mitigation Plan: Develop strategies to minimize downtime during future auto-upgrade
Description
We encountered an unexpected downtime of 6 minutes during an auto-upgrade process for Elasticache/Redis. Typically, auto-upgrades are triggered in response to CVE's, ensuring security related upgrades are promptly integrated. However, this particular downtime instance exceeded the expected duration and disrupted operations.
Tasks
Success Metrics
Acceptability Criteria