department-of-veterans-affairs / va.gov-team

Public resources for building on and in support of VA.gov. Visit complete Knowledge Hub:
https://depo-platform-documentation.scrollhelp.site/index.html
281 stars 202 forks source link

Discover 0781/a silent failures #94375

Open lisacapaccioli opened 1 week ago

lisacapaccioli commented 1 week ago

We have confirmed that there are silent failures of 0781, 14 in the last year. And assumed more going back to when it launched.

This thread reveals that 0781 has not had any historical silent failure investigation and remediation work since the date it launched, and that it has failed silently. It needs the "full treatment" that we gave to evidence upload failures -- finding all the old ones, regenerating them, confirming the volume, and getting them to VBA.

The outcome of this phase should be an artifact that shows when logs are available, and what information we have, and volume counts.

The discovery work should solve for these:

0781

TBD

0781a

TBD

Out of scope for this Epic

Notes This will help VBA make decisions on what to do with the data we have. We likely won't be resubmitting anything on our own, directly into the efolders. Why? Because without an EP, no one will know to look at and re-review it. This process will mirror other code yellows in that we'll be creating spreadsheets, delivering documents, and working with VBA to take action.

See: CY3 knowability table and log timeline for an example of the type of information we'll need. Can be a doc, table, diagram, or a combination.

If the total volume is less than 100 or so, VBA OFO may be willing to handle them in the manual upload/emergency failsafe process.

lisacapaccioli commented 3 days ago

@pacerwow @emilytheis - Team Carbs has capacity to do this now. We've assigned to Scott for Sprint 40.

freeheeling commented 3 days ago

Datadog query covering last 15 months returned 14 results for "Submit Form 0781 Retries exhausted", 1 on 8/31 and the others over an ~12 hour period between 1/19 and 1/20, all before the Form0781DocumentUploadFailureEmail was incorporated on 10/8.

Form526JobStatus.where(job_class: 'SubmitForm0781').where.not(status: %w[success retryable_error try])

Form526JobStatus.where(job_class: 'SubmitForm0781').where(status: %w[retryable_error try]).where('updated_at < ?', 2.days.ago)

195,623 Form526JobStatus.where(job_class: 'SubmitForm0781') records 194,701 Form526JobStatus.where(job_class: 'SubmitForm0781').where(status: 'success') records difference of 922 records