department-of-veterans-affairs / va.gov-cms

Editor-centered management for Veteran-centered content.
https://prod.cms.va.gov
GNU General Public License v2.0
99 stars 69 forks source link

Postmortem: 7/25, 7/26, and 7/30 Prod Offline Incidents #18806

Closed gracekretschmer-metrostar closed 1 month ago

gracekretschmer-metrostar commented 3 months ago

User Story or Problem Statement

The VA needs to understand the root cause of each prod outage and the steps the CMS team is taking to prevent these outages in the future.

What is Known

CMS prod went offline on 7/25, 7/26, and 7/30 CMS began the process to undergo the certificate renewal on 7/25 Jenkins went offline on the evening of 7/25 and was brought back online on 7/26 Certificate renewal was finished on 7/29

Relevant Links

Steps for Implementation

Acceptance Criteria

gracekretschmer-metrostar commented 3 months ago

Confluence page for the draft postmortem: https://vfs.atlassian.net/wiki/spaces/PCMS/pages/3353247840/2024-07-25+25+30+CMS+Prod+Offline+Incidents

gracekretschmer-metrostar commented 3 months ago

Easyretro board: https://easyretro.io/publicboard/NqWKfJWUr2dFRoCJjX2ybXcRyws2/0e6b627e-5e1d-4583-bafd-e12193cedd88

7hunderbird commented 1 month ago

I have a PR ready to review for this https://github.com/department-of-veterans-affairs/va.gov-team-sensitive/pull/1969