department-of-veterans-affairs / caseflow

Caseflow is a web application that enables the tracking and processing of appealed claims at the Board of Veterans' Appeals.
Other
54 stars 19 forks source link

Certification | What do we do when VBMS /other dependency requests are taking a long time? #2510

Closed amprokop closed 7 years ago

amprokop commented 7 years ago

Problem

@shellicious observed in a Montgomery pilot that when users entered Certification, the page was blank and loading for several minutes and never loaded. We determined that this was because the VBMS request was taking between 2 and 5 minutes to successfully complete. Evidently, our user’s browser request was timing out before the VBMS request returned.

Right now, we don't launch the React app until our requests to VBMS (for document dates and form 9) BGS (for poa information) and VACOLS (for document dates and hearing info etc) are complete. It would be a better user experience if we gave the user more information about what was happening, especially when these requests take a long time.

However, it may be counterproductive to set a timeout on our VBMS requests, as we have observed that some eFolders are consistently very slow, and we don’t want to lock users out of using Caseflow Certification to certify appeals associated with those eFolders.

Potential Solution

When the user clicks “Start Certification” in VACOLS and is redirected to /certifications/new/####, we immediately show a spinner and a “Starting Certification" screen, and load the Check Documents page only when the VBMS/BGS/VACOLS requests complete.

If, after 30 seconds, VBMS/BGS/VACOLS requests have not finished, display a message that says something like “Sorry! It’s taking a long time to start certification. We’re trying to fetch information from VBMS and VACOLS. Hold on, please…”

Open questions

How often do VBMS document list requests take 60 seconds or longer to complete? We have some experience with long-running VBMS requests, but to properly prioritize this, we should have some sense of how often VBMS really slows down.

Should we display an explicit error message and time out after some very long interval (5+ minutes?)

How should we save information about the status of the VBMS request? Should we wrap everything in a transaction?

amprokop commented 7 years ago

@shellicious @laurjpeterson — wrote down an idea here. Please refine/mock up/do whatever. Feel free to have a post-standup discussion about it next week if you like.

mdbenjam commented 7 years ago

Two things I'll add:

  1. Are we sure those load times happen? Reader is under the same constraints and I haven't heard about these timeouts. Maybe this order of magnitude of wait times is when people start saying VBMS is down.
  2. Is there any chance to prefetch files?
askldjd commented 7 years ago

image

Here's some stats. Just looking at a Prometheus counter on FetchDocumentById. At the 99% quantile, I am seeing that most documents are fetched in 6s. The multi-minutes latency is a rare event.

Although high latencies rare events, I do think it is important to tackle this issue.

askldjd commented 7 years ago

Ah, I misread your issue. I didn't realize ListDocuments is the problematic API.

image

It looks like 30s latency for this API is not uncommon.

amprokop commented 7 years ago

Maybe 30s is the wrong number! Should it be more like 10 or 15 seconds? For users of a consumer-facing website, if it's longer than 10s, they're probably gone :) but VA is different of course.

amprokop commented 7 years ago

to @mdbenjam:

  1. Are we sure those load times happen? Reader is under the same constraints and I haven't heard about these timeouts. Maybe this order of magnitude of wait times is when people start saying VBMS is down.

@shellicious shadowed a certification where VBMS wait times caused Cert not to start at all — likely the browser request timed out before the ListDocument request finished.

Is there any chance to prefetch files?

We tell users to change the VBMS dates to match VACOLS dates, and we use the ListDocuments call to tell whether or not dates are matching, so we can't cache that call. We could potentially prefetch the form itself, though we don't know exactly which cases will be certified.

NickHeiner commented 7 years ago

When the user clicks “Start Certification” in VACOLS and is redirected to /certifications/new/####, we immediately show a spinner and a “Starting Certification" screen, and load the Check Documents page only when the VBMS/BGS/VACOLS requests complete.

+💯. This is what Reader does. It makes the app feel more responsive.

If we're living in full SPA land, then the initial SPA load should be as lightweight as possible. Don't make any data calls on the backend, especially ones to slow services. Instead, return just the HTML and JS to load the page, display a spinner, and then start on the data requests.

Another benefit of this is it makes it easy to distinguish between slowness of Caseflow and the other VA dependencies. When we can clearly message that it's the dependencies that are down, we can maintain user trust in Caseflow itself.

amprokop commented 7 years ago

Discussion notes

@shellicious — Hearings Prep and Reader have a similar spinner on app load. We'd like to make this as similar to theirs as possible. From a design perspective, do we want to show the user any messaging? (Like after 30s show "We're sorry, the Veteran's file is taking a long time to load" or something)

NickHeiner commented 7 years ago

From a design perspective, do we want to show the user any messaging? (Like after 30s show "We're sorry, the Veteran's file is taking a long time to load" or something)

👍

NickHeiner commented 7 years ago

From an implementation perspective, @amprokop, this would be a great place to ensure that we're sharing components as much as possible. In addition to the spinner itself, it would be nice to share a component that does the "show the spinner until the data is loaded, then show this other content" logic.

amprokop commented 7 years ago

Absolutely, @NickHeiner — once we have some design guidance, we can hammer out those details (you spelled out mostly what i've been thinking though)