department-of-veterans-affairs /

Public resources for building on and in support of Visit complete Knowledge Hub:
282 stars 200 forks source link

[spike] DR | Review failed evidence uploads caused by temporary platform AWS issue #72199

Closed data-doge closed 1 month ago

data-doge commented 9 months ago

On Dec 18th, after the daily deploy, our evidence submissions to Lighthouse began failing. See Slack alert. It looks like there were roughly 30 failures. See DataDog. This was caused by a platform issue, which platform resolved by reverting. See Slack.

These roughly 113 cases are cases where a Veteran successfully submitted their form, but then their evidence submissions failed, asynchronously, thereafter. We will need to identify these failed evidence submissions and then manually re-submit them.


saderagsdale commented 9 months ago

@data-doge Thanks! Can you confirm which forms (SC, NOD) experienced the evidence submission failure?

saderagsdale commented 8 months ago

Note: Remediation path will be different depending on the form affected. Need to talk to the enablement team about refining the remediation process for transient errors or getting platform to prioritize them.

HeatherWidmont commented 7 months ago

Move to next sprint because this is not a dependency for the NOD infinite loop email work

data-doge commented 7 months ago

Still haven't looked into how many of these were NODs, but if any were, we'd have until March 17th before we hit the 90-day deadline. So we should pick this up within the next two sprints. Ideally, next sprint - at the very least, to determine whether any NOD evidence uploads need to be re-submitted.

HeatherWidmont commented 7 months ago

@data-doge to start on this today

anniebtran commented 7 months ago

All of these failures were SC so we aren't bound by the 90 day limit and these cases may have already been resolved by VA employees. We'll touch base at next standup

saderagsdale commented 7 months ago

Thanks @anniebtran! I have not sent any comms related to this issue to VBA, aside from flagging it for awareness. Is there a link to any communications that suggest someone from VA has triaged these? We'll need to verify that or plan to send it ourselves.

data-doge commented 7 months ago

@saderagsdale We haven't received any communications from VBA - though I vaguely remember hearing about a manual process where VA would reach out to Veterans who did not submit evidence with their SCs. I may be misremembering that though. Would be nice to chat through that with you at standup or whenever. We may also have the ability to manually submit this evidence ourselves, but we'll hold off on pursuing that possibility until we've chatted with you.

saderagsdale commented 7 months ago

Got it. We'll want to send them the list of SC submissions and see what the status of each one is. We'll also want to let them know which Veteran intended to upload evidence, and what they tried to submit, so they can reverse any negative decisions that may have been made due to the failure.

anniebtran commented 7 months ago

Just so it's documented somewhere — I created a spreadsheet of affected SC submissions is in Sharepoint and shared it with Sade so we can follow up when we're ready to shift focus to this 👍

saderagsdale commented 6 months ago

Check ancillary forms. (5103, 4142/a, evidence, and SC)

anniebtran commented 6 months ago

@saderagsdale I unfortunately can't check for 5103 acknowledgement on these records because the form data has been cleared out in the LH records (I'm not sure how long they keep that data around for SC submissions). Also seems that for 4142, we're submitting directly to Central Mail and I'm not seeing any records in our database that serve as any kind of receipt for it, but we do have logging/monitoring for the 4142 submission job and would've been alerted if the job/retries failed for this submission and I'm not seeing any. @data-doge let me know if I'm missing any context here or if you have other ideas for checking on these ancillary forms.

anniebtran commented 6 months ago

Chatted with Eugene a bit about this and I think we've reached the limit of what our team can look into on this — Central Mail might be able to locate those 4142s and I would imagine they'd also be able to check on the 5103. @saderagsdale anything else you want me to look into?

HeatherWidmont commented 6 months ago

Deprioritizing this in favor of other work this sprint. Moving back to the backlog

anniebtran commented 1 month ago

These are all covered by the current remediation effort and are included in the spreadsheet sent to VBA