Open anniebtran opened 7 months ago
@saderagsdale Feel free to drop updates on your investigation of who manages the maintenance windows here!
@saderagsdale is this related to the ITF work that's currently being done, or is this a separate issue?
@eileen-coforma This was written before the 526 team/Emily kicked off that work. @anniebtran Robin would be the best person to add an update based on the spike he worked on.
Value Statement
As a Veteran I want to So that
Background Context
Based on investigations for https://github.com/department-of-veterans-affairs/va.gov-team/issues/72813, we found that we have large clusters of 500 errors from attempted ITF creations that are caused by Lighthouse maintenance windows (and smaller clusters from vets-api deploys).
This ticket covers the follow up work that may be required to address this problem, which may include adjustments to the frontend to display a downtime message during these maintenance windows. Sade is having conversations with Kayla Watanabe from LH to see if there's a way to get that integration in place
Outcome, Success Measure, KPI(S), and Tracking Link
itf_creation_failed
500 errors in Sentry/DataDog during scheduled maintenance windowsAcceptance Criteria
Tasks
Definition of Ready