Closed jilladams closed 1 year ago
Discovery document and Slack thread where I posted it.
Questions and recommendations in the document. I suggest a conversation with the team to determine next steps / ticketing.
Good work, Randi. Glad you got closure on this, honestly.
We can verify both of those with Wes in scrum tomorrow, I've added to the board.
Also: Reminder to move this doc to PW team folder until we get it into Github somewhere.
@jilladams Thank you. Gaurav asked me to send it over to him once it was documented in the team repo, so I thought I'd check what the final destination of this would be. Makes sense to just keep it here.
VW
= vets-website
CB
= content-build
First, createHeaderFooterData (in CB
) pulls Drupal footer data with GraphQL. It runs this data through formatters for VW
consumption.
Next, headerFooterData
is added to local storage (window
) under VetsGov.headerFooter
;
this only works for va.gov.
On va.gov, open the Dev console and enter
window
; the header & footer data exist inVetsGov
.On va.gov/health (TeamSites, i.e. injected header & footer), open the Dev console and enter
window
; the header & footer data don’t exist underVetsGov
.
When CB
is built, it creates this file in VW
: build/localhost/generated/headerFooter.json
. VW
uses the contents to populate the injected header/footer.
A file with hard-coded header & footer data in VW
(header-footer-data.json) is responsible for populating these sections when a developer is running only VW
locally (see dev-template.ejs). In that situation, CB
is not providing any Drupal assets.
The formatter functions for createHeaderFooterData (in CB
) are unit-tested.
In VW
, unit tests cover the footer source code using header-footer-data.json as a data source.
The prod footer does not get data from header-footer-data.json
but instead from CB
/ Drupal. For the purposes of a unit test, the footer receives a data object from its parent and doesn’t know where it’s from.
The injected header & footer have a Cypress suite in VW
in proxy-rewrite (responsible for header & footer injection) that tests the presence of a <header>
or <footer>
HTML element, but not the contents.
Question: Can the Cypress suite for the proxy-rewrite
header & footer also test the contents?
Note: the header is also known as “megaMenu.”
There is a megaMenu Cypress suite with some skipped tests that is TestRail-integrated. It is not running in the Cypress CI. I asked Tze-chiu Lei about this. He did the TestRail integration for this file, but doesn’t know why the suite is not running in CI.
Question: Why isn’t megaMenu.cypress.spec.js
running in the CI?
The authenticated megaMenu has unit tests.
There are also tests for components of the header, such as Veteran Crisis Line and Search. They each have all the data they need to render in their source code or from the parent component in VW
. These tests won’t fail when Drupal data is missing – this would have to be caught in an end-to-end test where the JavaScript is halted in execution from an error down the stack.
There is a Cypress suite that covers the new-home-page*
functionality, but it is only run in local dev. va.gov/new-home-page
and va.gov
were not identical at the point the tests were written. If this suite had run in CI, it wouldn’t have caught our problem as it only tests /new-home-page
.
Question: Can this suite be duplicated, modified to cover va.gov
and enabled in CI?
A basic homepage smoke test exists. This is run in the CI.
*
/new-home-page is used for staging a new homepage while leaving the current one intact. We launched the latest design in June 2023; as of July 28, 2023, va.gov/new-home-page
and va.gov
match.
The footer’s data structure was changed from an array to an object when we added Drupal data, and VW
was changed to consume the data from CB
in this structure.
When the CB
data did not deploy to prod, VW
received the old array format and this broke the new VW
footer.
Here’s the Slack thread where the incident discussion took place.
A CI issue related to AWS credentials stopped the dev, staging and prod deploys. The Notify CMS part of the pipeline (to alert DataDog) also failed.
We failed to look at the Front End Support Dashboard for the deploy status. It’s not possible to know at this point how the changes were successfully verified in staging.
Theory: without the FE Support Dashboard to notice CB
deploy issues, we did not wait long enough for VW
changes to hit staging. In that case, we would have verified the pre-existing functionality in staging. We were about 1.5 hours from prod deploy when staging was validated and many others may have been on lunch. We did not see any communications about staging being broken.
The homepage rendered a blank header and no footer at all, and React widgets did not load (they were stuck on a spinner – more on this further in this doc).
CB
creates the <header>
element and adds <div>
s for header elements such as banners and Veterans Crisis Line. It creates a div for the header to be populated by VW
(via ID targeting); even if there are problems on the VW
side, the <header>
still exists partially.
The <footer>
is also created in CB
, but its sole contents are the empty <div>
that VW
should populate. If there’s nothing for VW
to add (because the JS broke), it renders an empty element.
The footer setup code is run before the header setup code, and before React apps are initialized. JavaScript errors in the footer setup will stop execution of anything after it.
All React apps in VW
use a startApp platform utility. See facility-locator usage here. Within the platform utility, the first step before starting an app is to set up common functionality across all React apps (like populating the global header & footer). If any of that setup errors out, the apps aren’t started.
This is the screen for facility locator. The loading indicator never resolves. At the time of the prod outage, this was happening for every React application in VW
, in addition to the header & footer not rendering correctly.
Note: the footer renders a blue background here, but it’s not immediately apparent why. The <section>
and <footer>
elements have a 0px height (and there are still no footer contents), but when these elements are removed from the DOM, the blue background color goes away.
Any pages that exist in CB
alone were unaffected.
Pages with hybrid Drupal and React content had experiences similar to the below screenshot. Static (Drupal) content renders from CB
. But VW
(React) content injected into the page (such as a login widget) showed a spinner similar to the facility locator page in the above screenshot.
Similar to va.gov
, va.gov/health
had a blank header and no footer. Public Websites does not own any code in TeamSites, so it is difficult to know if any functionality within the site was not working. TeamSites have many pointers to va.gov in an effort to deliver consistent information to veterans and slowly migrate to our systems. Some of their content is CMS-driven and would have been unaffected by the React widget problems on the va.gov side.
We reverted all of the related VW
changes to use the pre-existing footer logic. Ryan Koch suggested we do all data transformation in CB
and pass it to VW
in the format it needed prior to any of this work.
This worked well and we achieved the desired outcome.
CB
pull request
There are a few Questions in the findings above that could drive discovery and/or implementation tickets.
It seems the most important question to answer is:
Do we want errors in the header or footer to halt execution of va.gov pages?
Ideally, our unit tests and Cypress tests should stop such code from ever reaching main in VW
. But if any JavaScript fails, it should not hold up the execution of JS after it unless that code is dependent on the data or related in another way to the failure.
Either way, we need more robust unit and Cypress tests to cover the header, footer and home page functionality and to further consider how our setup in general can be optimized.
This Slack thread captures the discussion and feedback from the wider team regarding next steps.
Thanks Randi! In today's product sync, DaveC confirmed he'd like to be present to listen in on the discussion, so next step is @wesrowe will help schedule that chat.
@jilladams, can we close this ticket prior to the meeting, or is it carrying over? I can go either way.
Next steps are recommended, I feel ok closing this. I own scheduling the review, which I'll work on by tomorrow EOD.
Description
User story
AS A FE engineer I WANT confidence that my changes are tested enough to confirm they will not break prod SO THAT I don't break prod.
Engineering notes / background
Post-mortem action item from: https://github.com/department-of-veterans-affairs/va.gov-team-sensitive/pull/975
In short: content-build changes failed to deploy before vets-website changes. The vets-website changes that deployed were isolated to the footer but somehow broke many, if not all, React widgets. CI tests passed for the push to prod. Why?
This will likely require coordination with the Platform team.
Analytics considerations
Quality / testing notes
Acceptance criteria
Timebox: up to 1 day