Closed va-albers closed 7 months ago
The errors show that txt files were attempted to upload. We will want to research if the Backend prevents various txt file encodings from being submitted. We have what we need to research further.
cc @tblackwe - need the following for review Impact summary
This is a silent failure for claims and needs to be a high priority fix.
We need to determine a mechanism to catch this whenever possible inside the vets-api submission. even if we can catch a percentage of the occurrences it will be an improvement.
@austinjprice please add it to this sprint's backlog
@tblackwe added to next items on board for discussion tomorrow refinement
Ok took a close look at the ticket, I think I have plenty to dig into this. Basic plan of attack will be:
app/workers/evss/disability_compensation_form/submit_uploads.rb
) and why the retry count is so high (14)vets-api
validation of the document's doc type, and if we have none locate where we would add them.Only question I would have now would be:
and why the retry count is so high (14) 14 is the number of retires that most closely gets to a 24 hour period, with sidekiq's default retry logic. It actually may be 13, which is what the Submission retry is set to, but was originally set to 14 for some reason and never changed.
The submission job has logic to detect if the error falls into certain types of known failure cases that are known to not be able to succeed, no matter how many retries it does, and not retry in those cases (immediately send backup submission).
The retries I think are or were meant to smooth out transient issues such as network issues or outages, that are assumed to be external and resolved eventually (like downstream systems being down).
However I do not think the upload job has this same logic.
Track down any vets-api validation of the document's doc type, and if we have none locate where we would add them.
@Mottie would know where this is happening probably.
do we know who we would talk to on the EVSS team that could share the validation on their side and what doc types they disallow that will throw this error?
@kyle-drewno is who I was told was taking over the EVSS position from the developer I used to most closely work with at EVSS.
Is this potentially documented anywhere?
There is a EVSS 526 endpoint validation schema that I was sent at some point. It may have this info on it.
The frontend checks that the file extension matches the file signature in the FileField
component, but txt
files don't have any signature (except for Windows BOM, see screenshot - src), so they are not checked; but the txt file MIME-type is checked on the frontend.
This screenshot shows a hex editor examining a tiny_file.txt
file (4 bytes total size) on my Mac
The backend is using mimemagic
, so maybe there's a bug in that gem?
Wow thanks @kylesoskin and @Mottie this is all very helpful
Documenting notes from a discussion with @SamStuckey on this, since his logging work meant he had some insights/theories on how the documents flow works globally on our side.
Insights:
Form526Submission
is the central model for the overall form submission.Suggestions:
vets-api
side, we would want to do it in EVSSClaimDocumentUploaderBase.We ended up closing this related ticket around rejections for files with too much text, as it is having a relatively small impact. But noting it here as it's related.
cc @aurora-a-k-a-lightning @sethdarragile6 @navaeddie @mchae-nava @christinec-fftc - fyi
PS may be worth looking at the mimemagic logic in the claim document uploader, as I think that is obscuring our validation strategy to a certain extent.
@kylesoskin I forgot to ask - if we can't track down documentation/a subject matter expert on this EVSS endpoint, is it possible for us to inspect the validation code on their side ourselves? Is it in a public repo?
Posting some additional new findings/an update from a very helpful discussion with @va-albers and @SamStuckey at the end of last week.
Main Questions to Answer -How does our file type validation differ in detail from the EVSS side? -Do we want to be retrying these failures 14 times or is this causing issues on the EVSS side? -Is it possible the issue with large file uploads is potentially throwing this error as well? -Why did certain specific files get rejected?
TODO/My play of attack
Findings Summary
-We have a few upload issues between our system and EVSS, including an unrelated issue with different file size limits. Steve said we accept 150 MB files but anything larger than 50 MB on the EVSS side causes them to be split up, a process that can fail/become overloaded if we are constantly trying and retrying to upload. The 150MB size limit on our side is meant to make it easier for vets who may not know how to compress files. -According to Kyle Drewno from Lighthouse, "Here's how eBenefits describes the allowable file types: PDF (unlocked), GIF, TIFF, TIF, JPEG, JPG, BMP, and TXT. What they don't mention is that TXT files must be UTF-8 encoded otherwise you get the EVSS_15004 error."
Providing an update here:
Folks at EVSS stated that, despite what the error message says, 15004 occurs in the specific situation where a txt file is uploaded and is not properly UTF-8/ANSI encoded
I’ve confirmed we are not currently validating the file text encoding
This leaves us with a couple options, which I will summarize below
Once I have Datadog access (getting PIV card today) I’d like to document how often this error occurs and how often these uploads succeed, if possible
@SamStuckey and I ended up discussing this during an unrelated Slack call this morning, and there are a couple potential courses of action (Sam correct me if I misunderstood):
We manually re-encode txt files before sending them to EVSS. Sam had some concerns around this: we are potentially risking manipulating the contents of that personal data, we’re taking on some liability in reading and re-writing it, and we may need to require a user check a “release” permission type box to get this past the collab cycle.
We can detect an invalid encoding immediately when the user attempts to upload the file and raise an error. This wouldn’t be a great UX but we could explain to them how to either re-encode the file properly, or more likely, export and upload it as a PDF with clear instructions on how to do so. Sam was leaning more strongly in this direction and I agree.
Noting there are two roughly adjacent errors we may want to consider addressing in a single epic alongside this one:
Large files that EVSS needs to split potentially overwhelming their system. We currently cap uploads at 50MB as EVSS does, but for PDFs we allow files as large as 150MB because EVSS has a system to split them. However, this can overload their system and if some of them fail and we attempt to retry too many times, it can even bring that part of the system down (per @va-albers , not aware of a ticket but feel free to link)
Austin will make an epic for the above additional bugs today. Need to sync with Julie and Nichole to understand how this will affect other UX.
this was broken into smaller tickets
Instructions
Please fill out the form below, providing as much detail as possible.
Issue Description
EVSS Responds to EVSS::DisabilityCompensationForm::SubmitUploads calls with message: EVSS::ErrorMiddleware::EVSSError { "key"=>"EVSS_15004", "text"=>"We were unable to complete your request because your file type is not a supported format. Try using a different file type such as PDF (unlocked), JPEG, BMP or TXT. If you still have problems, call the VA at 1-800-827-1000 (711, if you use TDD), Monday - Friday. 8:00 am - 9:00 pm ET. When you call, provide the following error code: EVSS_15004.", "severity"=>"ERROR" }
Although the EVSS system suggests using extensions like "TXT", there are examples of file names that end with "txt" that fail (see file upload error Sentry log
It appears that the system will repeatedly retry these pages, although the error isn't one that can be fixed by replay? For instance the issue listed above notes that it is on retry_count 13.
Information needed to assess the severity of the issue
Information that aids in the research and troubleshooting of the issue
Any Additional Information: