department-of-veterans-affairs / va.gov-team

Public resources for building on and in support of VA.gov. Visit complete Knowledge Hub:
https://depo-platform-documentation.scrollhelp.site/index.html
282 stars 200 forks source link

SC and NOD | Add validation to prevent PDF uploads exceeding 78in x 101in #74493

Closed data-doge closed 6 months ago

data-doge commented 8 months ago

Value Statement

As a Veteran I want to submit evidence, which may include very large documents, for my NOD / SC So that my appeal can be evaluated with all the information I believe is relevant.


Background Context

Roughly 30% of the evidence submission errors we see downstream have the message: "Maximum page size exceeded. Limit is 78 in x 101 in." This is a Data Dimensions requirement, and the limit cannot be increased.

We investigated three relevant PDFs in S3. Two were 1-page continuous PDFs of an entire webpage (i.e. blog post from law firm; scientific paper). One PDF was a long report regarding burn pits, potentially originating from a web page. It is likely that all of these dimensions errors are cases where a PDF representation of a web page is normal in width, but very long. Though, while investigating a different error, we encountered a PDF with several pages, one of which was outlandishly large, with dimensions likely exceeding a thousand inches.

Implementation Notes (from Annie)

If we were to keep the current implementation and turn enable the setting that calls the /validate_document endpoint, we would have to tweak the transaction flow a little to handle the user password scenario. Some notes as I was thinking through how to approach that:

Additionally, we'll need to modify platform's file upload component to skip the component's/frontend password checking step in favor of letting the API call detect whether a file is encrypted with a password or not AND have it display the password prompt based on the response returned from the backend.

Looks like another team made a copy of the FileField component in order to customize it to their needs, so we might need to take this path. I'm not sure how much of a lift it would be, and it's definitely not DRY, but would probably be the least invasive approach.

Acceptance Criteria

For SC and NOD forms:

Design

Tech Tasks

Definition of Ready

Definition of Done

saderagsdale commented 8 months ago

Risks: Changing the format or cutting the document leaves us unclear what a claim reviewer would see if submitted that way.

Preferred: Messaging to tell Veterans that they cannot upload web pages converted into long PDFs, needs to be saved as standard 8.5 x 11 pages

Action items:

eileen-coforma commented 8 months ago

Did more digging into PDF creation on Safari. If they use the straight 'Export as PDF...' option, it'll be one single long page. If the use the 'Print...' option, and then select PDF, then it's broken down into multiple pages (as expected via Chrome). It does also change the formatting and removes some images... which risks rendering it in a manner the user might not want.

Example PDFs attached as reference. cats - Google Search - multi-page.pdf cats - Google Search.pdf

eileen-coforma commented 7 months ago

I did a bit of digging using iOS iPad. It seems that they do cut the pages, after it hits 200inches.

Chrome, Safari, and Edge all have a print option to save to PDF.

Safari on Mac and iPad has an 'export pdf' or 'create pdf' option that creates the long image.

Also, it's very hard to convert a long image PDF into smaller PDFs, since the trick is to do it with print modal when you're saving the original page. If you try to do that in Acrobat, it gives you an error, telling you to save as, rather than print, which prevents you from formatting it. Preview also doesn't have an option to tile to print.

eileen-coforma commented 7 months ago

@data-doge Did you sample any more instances of this error? Are there any other cases of large dimensions we need to plan for, or should we focus on resolving the long webpage issue?

eileen-coforma commented 7 months ago

Opened a conversation with CAIA since document dimensions might not be a silo'd occurrence with our form. This is the suggestion we got back from them:

Here's version of the error message I shared with the 530 EZ forms team (I removed the dimensions specific to their form): Your file can't have a [width/height] larger than [dimension]. Follow the instructions for your device on how to resize the file and try again.

It clearly defines the problem, and puts control back to the Veteran, but the guidance language leaves more to be desired. Resize might not make sense for our context (if they manage to resize to fit the length limit, it might not be legible). Having this validation at least identifies which file isn't working, and lets them make the decision on how to make it work.

The other way is to also include a suggestion to mailing the evidence in separately. It could be a fallback for all evidence types that can't be submitted digitally.

eileen-coforma commented 7 months ago

Started a document to capture recommendations. Would want to review with Julie to align on best next steps.

HeatherWidmont commented 7 months ago

@eileen-coforma is talking to Julie later today about this

eileen-coforma commented 7 months ago

CAIA needs a few days to look at this. Will follow-up with them on Friday.

eileen-coforma commented 7 months ago

CAIA made recommendations for the error, and I’ve adapted their suggestion into this design.

A few notes, also as comments in the design file:

Open questions:

eileen-coforma commented 7 months ago

Updated design to remove additional info box, and separate into another ticket. Will follow-up with DS regarding error alert pattern. Also have to address where the error alert is placed (if multiple files are submitted).

HeatherWidmont commented 7 months ago

@eileen-coforma to clear with the design system team if we can use an error alert for this action

saderagsdale commented 7 months ago

Still in discovery, and can be deferred until after NOD testing is complete. We need all hands on deck, and this sits lower in priority.

saderagsdale commented 7 months ago

Need to talk to design about the right way to show the error alert. @eileen-coforma

eileen-coforma commented 7 months ago

DS proposed a different error alert than initially designed. Revisiting error language with CAIA on Tues 4:30ET to try and condense message.

eileen-coforma commented 7 months ago

Attached design with error language to the ticket.

saderagsdale commented 6 months ago

Action items:

eileen-coforma commented 6 months ago

@anniebtran do we know if the 78 in x 101 in. is referring to H x W or W x H? It's weird that it's two different numbers.

anniebtran commented 6 months ago

78 = width, 101 = height (code for reference)

HeatherWidmont commented 6 months ago

@eileen-coforma to review the design with CAIA & Julie.

Spike work being done in another ticket so this ticket will be repurposed for the actual code work

eileen-coforma commented 6 months ago

@anniebtran Is there a way to determine which dimension exceeds the size limitation AKA height vs width? Wondering if we can be more specific in the error message.

eileen-coforma commented 6 months ago

Asked CAIA to review language, once confirmed, will async it with Julie.

anniebtran commented 6 months ago

@eileen-coforma Unfortunately I don't think so, at least from what I can tell in the validator code. It just checks for whether one of those dimensions has been exceeded and adds an error message stating what both limits are

eileen-coforma commented 6 months ago

Julie and CAIA have reviewed and approved the error message.

Two problems we are aware of and do not address in the current solution:

saderagsdale commented 6 months ago

@anniebtran @Mottie to apply an estimate

saderagsdale commented 6 months ago

@anniebtran how's this one going? Hoping to prioritize this if there's nothing more urgent on the bug triage homefront.

HeatherWidmont commented 6 months ago

@anniebtran working on BE part of this. @Mottie working on FE, seems like a lot more work that needs to be done

HeatherWidmont commented 6 months ago

Taking v3 component out of the scope of this

anniebtran commented 6 months ago

I talked to Kyle to get some background on why we never switched this feature on and make sure we weren't missing any glaring reasons to not use it — here are some notes from him:

I think Matt Self had told us to NOT enable it in prod because we were unsure if the requirements LH was using were actually REAL requirements or just there arbitrarily. For example the limit used to be 50Mb I think and then just one day we were like.... hey what happens if you just try to send a larger file downstream, and larger files worked or something like that. So Matt self wanted to try to get to the bottom of WHERE the actual requirements and bottlenecks are/were, and then make sure everyone enforces those. ... I dont think he or we got anywhere with that line of investigation. So... with all that in mind, you could test and make sure it still works as intended, and then if so, just flip it on. I mean we probably should have done that anyways because if we accept the upload, but it doesnt pass LH validation LH is going to error when sidekiq tries to upload it anyways, so it will go into the blackhole and the vet wont know, I think.... so turning it on is the same outcome (file does not get to the VA), but at least the Vet is aware of it and it is shown to them in the UI.

anniebtran commented 6 months ago

Had some interesting struggles getting this spun up on staging but I think we're good. Some notes on that:

anniebtran commented 6 months ago

PR to enable the PDF validator backend on production is merged. Unless pods get refreshed before the usual prod deploy, we might need to wait until 2pm CT/3pm ET tomorrow in order to for the production environment to pick up the change.

HeatherWidmont commented 6 months ago

@rocio-desantiago to validate in prod, if it looks good we can close this out

anniebtran commented 6 months ago

Looks good on prod ✅ closing