Closed gracekretschmer-metrostar closed 1 month ago
Confluence page for the draft postmortem: https://vfs.atlassian.net/wiki/spaces/PCMS/pages/3353247840/2024-07-25+25+30+CMS+Prod+Offline+Incidents
I have a PR ready to review for this https://github.com/department-of-veterans-affairs/va.gov-team-sensitive/pull/1969
User Story or Problem Statement
The VA needs to understand the root cause of each prod outage and the steps the CMS team is taking to prevent these outages in the future.
What is Known
CMS prod went offline on 7/25, 7/26, and 7/30 CMS began the process to undergo the certificate renewal on 7/25 Jenkins went offline on the evening of 7/25 and was brought back online on 7/26 Certificate renewal was finished on 7/29
Relevant Links
Steps for Implementation
Acceptance Criteria