Open timcosgrove opened 9 months ago
This work will be done based on the https://github.com/department-of-veterans-affairs/va.gov-cms/pull/17285 PR still in review to help with unblocking. Otherwise, we have to wait until that other PR is merged. We could keep updating the referenced PR but decided to branch off it instead.
I spent some time looking over the Tugboat configuration and Tugboat documentation...probably more than I needed, but the build stages are a bit confusing and docs like https://docs.tugboatqa.com/reference/tugboat-configuration/ help to sort out the intricacies between builds with base previews, those without, and non-build related stages.
One thought I had while looking at the Tugboat config file relates to pre-built images. I know Tugboat maintains a Cypress related image at tugboatqa/cypress-included
but in this repo's setup, all the Cypress related dependencies are gathered each build. Furthermore, I would think the VA could make an image on top of tugboatqa/php:8.1-apache-bookworm
with the additional packages to save time in future builds.
One bad thing about building each time is that the versions of all these extra dependencies aren't versioned like we have for composer and npm dependencies. Seeing as how the Tugboat environments have been having reliability issues recently, using pre-built images and removing a lot of the package updates in the init
stage would make sense to me. Then, only update the images when you need to update the base, e.g. when updating PHP versions and/or Cypress versions.
Another thing I noticed is a section where variables are echoed into a config file which makes checking out git branches harder. I wonder if we can simply set the env variables at that step?
# Old...
echo "NEXT_PUBLIC_DRUPAL_BASE_URL=https://cms-${TUGBOAT_SERVICE_TOKEN}.${TUGBOAT_SERVICE_CONFIG_DOMAIN}" >> ${TUGBOAT_ROOT}/next/envs/.env.tugboat
# New...
NEXT_PUBLIC_DRUPAL_BASE_URL=https://cms-${TUGBOAT_SERVICE_TOKEN}.${TUGBOAT_SERVICE_CONFIG_DOMAIN}
I think that would get merged into process.env
without having to edit the envs file and therefore no need to stash/pop when trying to restart the next server.
Finally, getting to the actual point of this issue, I think the server can be killed and restarted in scripts/next-start.sh
by first checking for a running process, killing any, and then starting the server again. Install and build commands don't interfere with the server commands and as long as the dependencies have been updated and the build without SSG enabled completes, the server should still respond to previews.
# In scripts/next-start.sh ...
cd ${ROOT}/next
# Kill any current running server.
# We can look for "/scripts/yarn/start.js" since that is what "yarn start" runs.
NEXT_SERVER_PIDS=$(ps aux | grep '[.]/scripts/yarn/start.js' | awk '{print $2}')
# In case we have multiple processes, loop through them.
for pid in ${NEXT_SERVER_PIDS}; do
kill $pid
done
# Start the dev server. Vets-website assets need to be in place prior to this build.
yarn start
I might be missing that there are two builds, one for preview and one for checking to see the whole site builds, but I think previews are the main thing that needs to work in next-build, and this method will build then start that server.
Now, I'm getting this error on Tugboat which comes from bash -lc 'composer va:next:start'
in the Tugboat config.yml
file:
> # Prepare the next build server
> ! ./scripts/should-run-directly.sh || ./scripts/next-build.sh
/var/lib/tugboat/next/packages/env-loader/dist/cms-feature-flags.js:5
const removeTrailingSlash = (s) => s.endsWith('/') ? s.substring(0, s.length - 1) : s;
TypeError: Cannot read properties of undefined (reading 'endsWith')
at removeTrailingSlash (/var/lib/tugboat/next/packages/env-loader/dist/cms-feature-flags.js:5:38)
at getCmsFeatureFlags (/var/lib/tugboat/next/packages/env-loader/dist/cms-feature-flags.js:8:31)
at processEnv (/var/lib/tugboat/next/packages/env-loader/dist/index.js:23:76)
at Object.<anonymous> (/var/lib/tugboat/next/scripts/yarn/build.js:3:1)
at Module._compile (node:internal/modules/cjs/loader:1191:14)
at Object.Module._extensions..js (node:internal/modules/cjs/loader:1245:10)
at Module.load (node:internal/modules/cjs/loader:1069:32)
at Function.Module._load (node:internal/modules/cjs/loader:904:12)
at Function.executeUserEntryPoint [as runMain] (node:internal/modules/run_main:81:12)
at node:internal/main/run_main_module:22:47
Script ! ./scripts/should-run-directly.sh || ./scripts/next-build.sh handling the va:next:build event returned with error code 1
I did get this locally since I didn't copy the env file and properly set up next-build after cloning the repo. On Tugboat this should be fine with the env vars being written into the file also in config.yml
but maybe the git stash/pop isn't working...so I'm rebuilding the PR attached to https://github.com/department-of-veterans-affairs/va.gov-cms/pull/17285 to see if that is successful.
One idea would be to turn the "echo variable into file" commands into a script and then call that after checking out the new code, if the problem is env vars not being picked up.
Also weird on the Tugboat previews...I added one commit to this issues draft PR that is based on another PR, but the size of the previews is way different.
Newer preview:
Older preview:
Since we are now cloning the repos multiple times and building...is this what's causing the Tugboat issues with space? Maybe we can't clone and build frontends both for next-build and content-build?
Also possible that we could look for files to delete in one of the stages before running commands...maybe some files are being cached that aren't needed?
so the tugboat preview size is not necessarily tied to cloning the repos, it's tied to using the base preview or not. the newer PR env is using the base preview, so it's relatively small. the older one was built without a base preview (to test changes to init:
), so it has the full size of everything. here's the size of the current CMS PR main
base preview:
long way of saying, yes it is contributing to tugboat issues with space, but unfortunately there's no way around it until we are able to merge that PR (which depends on the BRD PR, which depends on them bumping the AMI) 😵
Okay, that makes sense. But then this Tugboat should be deleted, right? https://tugboat.vfs.va.gov/65d690d5067b1809aade9992 It is a test of the same PR without a base preview that then was abandoned after confirming an issue...I haven't exactly been going through all the Tugboats and making sure I don't have old ones around, but this is the first time I've seen why/how this could be an issue.
And then it makes me wonder if dependabot PRs should be sequenced so they only ever have so many open. Like maybe https://github.com/department-of-veterans-affairs/va.gov-cms/blob/main/.github/dependabot.yml#L9 should be 5
instead of 10
?
On a quick count, I think about half the Tugboat PRs are dependabot...and IMHO, you ought to merge one in and then wait for the whole build and tests to run before merging in others. So, reducing that limit would reduce the desire to merge many in at once and potentially cause reliability issues.
yeah, that first one could be deleted. the dependabot ones... usually they get moved through quicker, but I think with the other infra issues they've been lagging a bit. hopefully they get moved through this week, it's a CMS team task.
re: echoing env vars,
Another thing I noticed is a section where variables are echoed into a config file which makes checking out git branches harder. I wonder if we can simply set the env variables at that step?
Old...
echo "NEXT_PUBLIC_DRUPAL_BASE_URL=https://cms-${TUGBOAT_SERVICE_TOKEN}.${TUGBOAT_SERVICE_CONFIG_DOMAIN}" >> ${TUGBOAT_ROOT}/next/envs/.env.tugboat
New...
NEXT_PUBLIC_DRUPAL_BASE_URL=https://cms-${TUGBOAT_SERVICE_TOKEN}.${TUGBOAT_SERVICE_CONFIG_DOMAIN}
exporting directly to the ENV instead of to the .env
file may work, but we have to test it. the next command is going to load envs from the .env.tugboat
file based on the APP_ENV var being present (it's always present on tugboat). we do have a mechanism to pass vars as part of the commands, but that may break the script on different non-tugboat envs. so some fiddling is likely.
oh yeah, and I think I have to build this with no preview or off of the other PR's preview...which I didn't specify on the Tugboat attached to this issue...so that is rebuilding now.
I will try debugging as much as I can locally and seeing why the git stash/pop might be causing issues or if that part is fine and it is another issue with env vars.
Maybe I'll make some suggestions for tickets of Tugboat things to look into, but I wondered how the hell the previews are so big and then I saw the assets pulled in from a backup.
2024-03-05T20:11:12.636Z - 65e7718f067b1809aa330f30# /bin/sh -c curl --remote-name https://dsva-vagov-prod-cms-backup-sanitized.s3-us-gov-west-1.amazonaws.com/files/cms-prod-files-latest.tgz
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
24 63.8G 24 15.3G 0 0 75.4M 0 0:14:25 0:03:28 0:10:57 76.0M
63.8G is about two-thirds of the total size I saw for a preview without a base, and in Tugboat's own documentation they use stage_file_proxy
to pull in needed files on demand in their config examples. I bet there is some networking restriction where Tugboat can't reach the production assets and therefore they all need to be backed up...or something like that; however, I've always seen junk in Drupal file directories. Maybe this files directory is clean (and I will look locally once my download finishes), but it will be the first time I've seen a clean files directory.
Another interesting benefit of using stage_file_proxy
is that you could analyze the Tugboat files directory against production to see how much of a difference there is. Building on Tugboat or locally should show the currently used files whereas production could have many years of files that are no longer used but still backed up.
Well, looky here: https://nextjs.org/docs/pages/building-your-application/configuring/environment-variables#referencing-other-variables So at least in page router, I think this works now?
Nope...it's only for variables within the actual .env file you are using or a higher-level .env file. I thought maybe it would be possible to write to a .env.fake
file that isn't version controlled and then read from there, but that probably won't work. Oh well, it would be nice to simply use interpolation in the .env.tugboat
and have dotenv support that by default but no use looking into that option any more.
I think https://github.com/motdotla/dotenv-expand/tree/master might work but I'm not going to try and modify next-build to make that happen. It should work to stash/pop...or maybe git reset, checkout, re-echo env vars, and then start the server.
I see now in Slack that Tugboat will likely be upgraded one minor version so that might play into this issue...but I will gather some questions for discussion purposes that I have from setting up next-build locally and how that might play into updating code and restarting the server.
next-install.sh
script? I know the env vars are updated on Tugboat but not sure about vets-website part. For local setup, we'd only need to run ddev composer va:next:install
and not any manual steps.vets-web-setup.sh
is re-run in next-build-frontend.sh
do the symlinks mentioned here need to be recreated? https://github.com/department-of-veterans-affairs/va.gov-cms/blob/main/.tugboat/config.yml#L207prerendering page "/_playground/api-explorer"
due to the oauth token so locally I delete that route. How do I get the proper keys? The readme says they are on AWS...build:preview
and start
interact? I know restarting the server is needed and I think rebuilding as well, but not sure if there's more to preview vs. other server needs like building everything out.Answers from chat with @tjheffner...
I thought this error was related to the symlinking so I added those commands before the vets-web-setup.sh
call in next-build-frontend.sh
but it still errored at the same part.
error Command failed with exit code 137.
info Visit https://yarnpkg.com/en/docs/cli/run for documentation about this command.
cp: cannot stat './script/../build/localhost/generated/vendor.entry.js': No such file or directory
error Command failed with exit code 1.
I will look into this tomorrow.
This actually fails locally for me even on the main branch even with just running scripts/vets-web-setup.sh
. I get the same error as above. I thought maybe this was an issue with generated assets colliding when trying to rebuild, but I would expect the first build to succeed.
Now I will read more on the vets-website codebase and why this code is needed and fails.
# From "yarn build" in vets-website...
# Only run Webpack if the assetSource = local
if [ "${assetSource}" = "local" ]; then
echo "Building application assets"
yarn build:webpack $webpackArgs
# This step fails...
cp -v "${buildDir}generated/vendor.entry.js" "${buildDir}generated/shared-modules.entry.js"
else
echo "Will fetch application assets from the content build script"
fi
Even if I comment out the copy step the script fails. The webpack build runs NODE_OPTIONS=--max-old-space-size=8192 webpack --config config/webpack.config.js
and I see that a SIGINT 137
is when the system kills the script potentially for memory issues. Since the -max-old-space-size=8192
is set, it makes me think they added that due to webpack consuming too much memory. I bet the build fails due to memory issues and that's why the directory doesn't exist.
So, I will play around with changing that setting and looking at building this not in the DDEV container.
Success on the vets-website build in DDEV! I had to increase my memory allocation for Docker to 10GB but it worked after that. For some reason, I had 7.9GB before so maybe 8GB would have worked.
I will make a note in the AC to update any docs on Docker resources for setting up the va.gov-cms
project since I had to alter mine. Not sure if 8GB is default for Docker Desktop or not.
Seems like things are working locally. I have the whole shebang running from "/admin/content/deploy/next", and the next server appears to be running with the script exiting successfully.
# From the next-build.txt log file.
> ! ./scripts/should-run-directly.sh || ./scripts/next-start.sh
Started next server with PID: 51391
> ./scripts/should-run-directly.sh || ddev composer va:next:start --
==> Done
# Looking for server start command.
alexfinnarn@va-gov-cms-web:/var/www/html$ ps -ax | grep "[.]/scripts/yarn/start.js"
51841 pts/0 Sl 0:01 /run/rosetta/rosetta /usr/local/bin/node /usr/local/bin/node --no-opt -r /proc/.reset ./scripts/yarn/start.js
Even though the pids don't match, when I killed 51841
the other process of 51391
was killed.
The thing I'm not sure how to test is exposing the port from within the PHP web container since it's not started locally on my machine. I see a vhost for storybook and maybe next-build should have one too? However, I think things are working well enough that I will start a build on Tugboat and see what happens.
After spending a while debugging Tugboat builds, I figured out an error in the Tugboat shell. Apparently, even though APP_ENV
is set earlier in the next build/start scripts, the yarn command doesn't pick up that env var so it is undefined when needed. Switching to APP_ENV=${APP_ENV} yarn
in those commands does get the env var picked up. I think I removed that from the code thinking it would be picked up and the code would be cleaner, but after putting it back the build/start commands picked up the right env vars.
I also found that skipping the files and tests can save at least 30 minutes, and that made me wish we had commit flags for CI so we could turn things off when we want to debug faster rather than having to comment out lines in a config file. I've used commit flags to target only certain tests being run and it was helpful.
hmm...I now have the PR working on Tugboat but I had to manually check if the next server was running and restart it. At first, I wasn't seeing a red footer like the PR I uploaded, but after manually restarting I did see what I wanted to see.
I think the issue might be with the code looking to kill the server and since that code is run in the background and via delegation scripts I'm not sure what the right processes are at this point. I saw next-route-worker
that I had to kill to actually stop the next server entirely.
So, the current code might be building the next site version, updating git info, trying to kill the wrong PIDs, and then the changes don't look like they've taken effect since the server isn't actually restarted. I didn't see anything in the logs as far as errors go. I'll try rebuilding and cataloging the PIDs/processes that are up when Tugboat starts and then as the server is restarted. I was hoping to grep for a certain string but not sure that will work.
After a few more Tugboat build errors...sometimes it says it's done half-way through running test, which is a lie...I was able to have next running by default and then switch to a branch with a red footer. However, upon trying to reset the environment for another QA tester, the footer is still red and the header looks broken. I didn't see any errors in the logs.
I'm not sure if this matters since someone would probably only switch branches once per environment...and once we're past the base preview issue, they should be able to refresh the preview to try another branch.
I will look into this more, but the code seems really close to doing what is expected.
Actually, it works on Tugboat if I choose a branch and then another branch to build. Looking at the code, I don't think the current content-build process takes into account someone resetting things to the main branch as there is no checkout that happens in that case...so the code could be improved in that regard but if content-build doesn't have it, then no need to have next-build have it. Plus, I'm not sure how often, if ever, people use this feature. If people use it often maybe there is a case to keep improving but for now I think it is okay to have a dev reset the git state via Tugboat shell instead of add code if people want to switch back to the main, default git branch.
still blocked by needing updated AMI for BRD
Requirements
We want a Composer command that restarts a running Next.js server on Tugboat, so that we switch the code checkout used for the Next.js for QA purposes. ```[tasklist] ### Acceptance criteria - [x] Composer command exists that stops the currently running Next.js server, builds/installs, and then starts the server again (can be multiple commands if that is more sensible) - [x] Update next-build-frontend.sh script from branch checkout form ticket to use these commands - [x] When a next-build branch is chosen via the branch selection form and the form is submitted, the Tugboat-running Next.js server is restarted with that branch checked out - [x] Kill any next server processes and restart the server in `next-start.sh` - [x] Preview functionality still works, and preview reflects the checked out branch - [ ] Use assets from the new `vets-website` repo. "We don't actually use the assets from it because we aren't rebuilding/pulling." - [ ] Make sure content-build previews still work - [x] Add target _blank to "Build is in progress. View log file" in "NextGitForm.php" to make viewing the build log easier - [x] Display a link to the log file if it has been built prior, even if a build is not currently in progress. ``` ## Background & implementation detailsThis follows on from #16853. That ticket set up the infrastructure for choosing a branch for next-build and vets-website. The goal of this ticket is to make use of those branches and to restart the server.
In order to test preview functionality, the feature flag that currently manages CMS preview being switched between Content Build and Next Build will need to be enabled.