department-of-veterans-affairs / va.gov-team

Public resources for building on and in support of VA.gov. Visit complete Knowledge Hub:
https://depo-platform-documentation.scrollhelp.site/index.html
281 stars 197 forks source link

[bug] 526ez | Investigate "PIf in Use" 526 rejections replaced with External service unavailable #43009

Open michelpmcdonald opened 2 years ago

michelpmcdonald commented 2 years ago

Sev 1 because we are back to significant number of claims being silently dropped.

Impact: As of July 7th, we have hit this issue 3873 times since June 1st

Starting on June 13th(confirm date, have script) "PIF in Use" claim rejections have stopped, but looks like they are replaced with "External service unavailable".

Related, we just enabled sending out failed submission notification to vet who hit the PIF in Use error, but now PIF in use has disappeared. Need to decide if we need to send out notifications for this newish "External service unavailable"..error, but first we need to figure out if the "External service unavailable" is something we can somehow address without involving the vet themselves, but it's hard to know because EVSS dos not return the underlying error behind the External service unavailable(see below)

It looks like EVSS masks the actual underlying issue in cases when they return a External service unavailable(ie raw form526.submit.establishClaim.serviceError error code in their response jason). I have to reach out to them, give them the claim rejection guid(from their error response json) and they have to manually look up the actual error. So fro example, i gave them reject GUID d4a4a9ff-c1e3-4701-aaad-0601b2bb468a, and the underlying error the received from their external upstream service is:

Tried to create an open work item for a claim (claim ID = 82736786) in a non-active state (claim status = CLOSED). Claim creation failed. System error. That sounds little PIF'ey to me, just a total guess. I can't emphasize enough how purely speculative this is, I have no comms whatsoever with BGS and absolutely 0 insight into their code, but I'm thinking that BGS removing the PIF validation has various exposed calim establishment issues that the PIF validation was designed to prevent. That might explain BGS's reluctance to address the PIF in use error. Using this ticket to track time spent on this.....I'll add specific tasks as they fall out of research conversations: https://dsva.slack.com/archives/C8R3JS8BU/p1655154539978949 - [x] Provide Michael Harlow a list of recent rejected "External service unavailable"
michelpmcdonald commented 2 years ago

script used find External service unavailable 526 submission rejects:

#go after external service errors
jss = Form526JobStatus.where(status: 'exhausted', error_message: "External service unavailable",  updated_at: 5.days.ago.utc..).order(:updated_at)

jss.each do |js|
    if js.bgjob_errors.to_s.include?('GUID:')
        #puts "Vetsapi id: #{js.id}, Has evss error guid: #{js.bgjob_errors.to_s.include?('GUID:')}"
        #next to last entry in bgjob_errors has the "last" evss error tracking guid 
        #puts "Err key #{js.bgjob_errors.keys[-2]}"

        #iterate the job run history list from newest run attempt to oldest, pick out the newest evss error tracking guid
        js.bgjob_errors.keys.reverse.each do |bgerr_key|
            if js.bgjob_errors[bgerr_key].to_s.include?('GUID:')
                puts "Vetsapi id: #{js.id}, #{js.bgjob_errors[bgerr_key]['timestamp']}, #{js.bgjob_errors[bgerr_key].to_s[-49..-7]}"
                break
            end
        end

        puts ""
    end
end; nil
michelpmcdonald commented 2 years ago

Sent EVSS a list of 526 rejected submissions that have an EVSS error tracking GUID. They are going to investigate and get back with me: slack thread

michelpmcdonald commented 2 years ago

Evss got back to me with some of the submissions:

GUID: 67f64772-ed03-407a-97cb-f1d76457fcc6 Null key returned for cache operation (maybe you are using named params on classes without debug info?) 2 GUID: be585969-fb37-43ed-b9ff-4a95c4314f9b This update is being elevated for additional review due to an Incident Flash associated with this Beneficiary 3 GUID: 92786b21-44ed-42f1-afd4-d451e038feb2 GUID: 66bc53f0-b978-433b-b1b6-13460e49ec88 Tried to create an open work item for a claim (claim ID = 62630584) in a non-active state (claim status = CLOSED). 4 GUID: 49ad2cf4-d5a7-446d-bc10-440e33a474be GUID: 84691ec6-a731-4e3f-b31f-25b8338e203e Tried to create an open work item for a claim (claim ID = 81389269) in a non-active state (claim status = CANCELLED). 5 GUID: 45c1c2ec-7f57-4ff8-b76d-6dc7816b8b6b GUID: bd25134d-d493-406f-9dfe-a13d13c07bbd Tried to create an open work item for a claim (claim ID = 87360762) in a non-active state (claim status = CANCELLED). 6 GUID: c9134a94-9367-426b-9ced-b58148b5db44 Tried to create an open work item for a claim (claim ID = 57885873) in a non-active state (claim status = CLOSED). 7 GUID: cef64d49-036a-4766-bdc2-d01cd715a933 Tried to create an open work item for a claim (claim ID = 63031431) in a non-active state (claim status = CLOSED). 8 GUID: f92fdec1-d2f2-4e64-8b1b-670a11065660 GUID: 3dc303d3-0085-4afb-8e3b-825f43333f5e

I(michel) don't really understand those error message all that much, i don't know if they are something I can correct on my end, I actually have very little working knowledge of all the details that go into establishing the claim, i imagine it's pretty complex....to me, for the most part, the claim is a blob of json data that vets-api just kind of shuttles between the website and your EVSS service.

Asking EVSS if they have any suggestions on how to approach

michelpmcdonald commented 2 years ago

Michael H did get back with me with lower level error messages, however, I was not able to determine that there is anything I can do on the the vets-api side to "fix" the issue.

Michael H. is going to open EVSS ops support ticket(s) and work with VBMS to try to figure out, what, if anything, we are doing that triggers the errors and if there is anything we can do to correct-establish the claims that hit the issue..

michelpmcdonald commented 2 years ago

Got an email from evss asking me to re-submit a few of the failed submissions, working on this now

saderagsdale commented 2 years ago

@michelpmcdonald can you share the next step for this ticket? Should we modify the ticket so that it shows the tasks that we need to complete in order to close it, or should we split something out?

saderagsdale commented 2 years ago

Update: Will icebox this issue until it picks back up. Sade to inform Matt.