Closed jilladams closed 2 years ago
Issue caused by vets-api (Lighthouse) latency and outage incident. This coincides with reported outages within AWS around this time as well (correlation, not necessarily causation).
I visited the offending page on va.gov and clicked the PDF download link. After more than 10 seconds, I received the following modal popup:
Clicking the 'Download VA Form 10-2850C' link caused the following alert to appear:
The link to 'email the form managers' was to:
mailto:VaFormsManagers@va.gov?subject=Bad%20PDF%20link&body=I%20tried%20to%20download%20this%20form%20but%20the%20link%20doesn't%20work%3A%20https%3A%2F%2Fwww.va.gov%2Fvaforms%2Fmedical%2Fpdf%2Fvha-10-2850c-fill%2520(1).pdf
The console was logging two errors:
Access to fetch at 'https://api.va.gov/v0/forms?query=10-2850C' from origin 'https://www.va.gov' has been blocked by CORS policy: No 'Access-Control-Allow-Origin' header is present on the requested resource. If an opaque response serves your needs, set the request's mode to 'no-cors' to fetch the resource with CORS disabled.
I suspect the CORS error was really a symptom of the larger API/network problems. A response without the expected CORS headers will cause this behavior, and some errors coming back from the API may not contain CORS headers.
The network error was not captured in the browser at that time, but a subsequent curl command ran from the terminal to the same endpoint resulted in a 502 Bad Gateway:
▶ curl -I https://api.va.gov/v0/forms\?query\=10-2850C
HTTP/1.1 502 Bad Gateway
Date: Mon, 08 Aug 2022 16:40:23 GMT
Content-Type: application/json; charset=utf-8
Connection: keep-alive
Referrer-Policy: strict-origin-when-cross-origin
Vary: Origin
X-Content-Type-Options: nosniff
X-Download-Options: noopen
X-Frame-Options: SAMEORIGIN
X-Git-SHA: 27c09b82bf6aecb12924bb041b16e41e7323adfa
X-GitHub-Repository: https://github.com/department-of-veterans-affairs/vets-api
X-Permitted-Cross-Domain-Policies: none
X-Request-Id: fe4ae8f1-8131-4f5c-a55a-3306ba6b4528
X-Runtime: 0.132098
X-XSS-Protection: 1; mode=block
Strict-Transport-Security: max-age=31536000; includeSubDomains; preload
Set-Cookie: TS01f27c67=0119a2687fc58ea7e6fa2c72712d44a5437365e15fe5d1b5255af75fd30161342c27f0057c250e448b1429e47c30f14e5605096bac; Max-Age=900; Path=/
Transfer-Encoding: chunked
Further probing of this and other Forms endpoints, including the Forms Search page, revealed additional API symptoms:
At this point it was clear we were dealing with a larger Lighthouse outage. Steve Wirt created a Platform ticket to track the issue.
@wesrowe if you wanna close this after you pass on relevant details to forms managers, feel free.
@jilladams, I think it depends on scope. Today's issue seems to have begun late last week. If we're looking at this issue as just about today, we can close it. If we're concerned about the weekend that led up to today... I'm hoping for Lighthouse logs that would show the Forms API issue began several days ago.
Daniel / Wes to pair on looking at the flagged content info and info on this form. Goal: see what sort of logs / data we need, in order to understand where failures are happening, and go ask the teams who own those systems. (Some of this may fall in a new ticket.)
We identified that this form was broken by a changed filename, (1)
added on Fri 8/5. This broke existing links in the front end, during the window where form name was changed but content-build was not yet released. After filename change, content-build did not run over the weekend, and failed several times on Monday prior to succeeding. Successful content-build resolved the issue.
Content-build failure notification is an open ticket, https://github.com/department-of-veterans-affairs/va.gov-cms/issues/9963
File name issue will get handled with Janel Keyes via email, @wesrowe to chase that. Closing.
Describe the defect
https://dsva.slack.com/archives/C52CL1PKQ/p1659976483935999
Form 102850c = almost the sole source of failures (as seen by google analytics) over the weekend –- 48 on Sat and 58 yesterday. On Fri it started to take the lead, but only by a nose, and it wasn't a leader on Thurs.
Labels
(You can delete this section once it's complete)
CMS Team
Please check the team(s) that will do this work.
Program
Platform CMS Team
Sitewide Crew
⭐️ Sitewide CMS
⭐️ Public Websites
⭐️ Facilities
⭐️ User support