Open timcosgrove opened 10 months ago
My first question with this ticket is who the CMS status notices are meant for?
The build could happen in the background without people using the CMS needing to be aware of the build. So, I think the CMS knowing the state of the content build is only for someone wanting to click the button to schedule a release, and that is only important outside of business hours since otherwise, the build is continuous.
I think the actual workflow for building the current content is located here: https://github.com/department-of-veterans-affairs/content-build/blob/main/.github/workflows/content-release.yml
In the va.gov-cms codebase, I see https://github.com/department-of-veterans-affairs/va.gov-cms/blob/main/docroot/modules/custom/va_gov_github/src/Commands/ApiClientCommands.php#L238 as where the code might make a dispatch request to that workflow, but I can't find where that drush command is being used, if it is at all.
My take is to use the GH Workflow as the source of truth and have the CMS ping the workflow run to determine status rather than trying to make the CMS the source of truth.
I know from looking at the queues before that the queue payload is not used and therefore I don't think the queue is necessary. Furthermore, the state machine always progressed from ready
back to ready
without any other meaningful state transitions. To me, it looks like the status is always set back to ready and then moved onto complete or back to ready. So, it's like a boolean isBuilding
flag more than a state machine...but I could be missing something.
I have to look at the code more, but relying on GitHub to update the CMS could end with the build stuck in pending whereas pinging a certain workflow run should always return something even if it is not a 200. Also, there is a decent bit of code that pings the CMS with authenticated requests that I don't think are necessary since the CMS can ping GitHub to figure out the status of things....once again, I could be missing things, but I think the CMS pinging GH to start a run or check the status will be easier to maintain than having both the CMS and GH pinging back and forth.
Here's how this would go, following along with the way I see it currently done in the content-release
workflow.
ready
or not doesn't really matter IMHO. GH is the source of truth. I think the status of any content-release
workflow could be obtained by filtering the list of current workflow runs by workflow ID: https://docs.github.com/en/rest/actions/workflow-runs?apiVersion=2022-11-28#list-workflow-runs-for-a-repository buildrequest
boolean so users can click to release content and have something happen vs. checking back when a build isn't running. Currently, when a user clicks the button it adds a queue item but realistically there's always other items in the queue so I guess it only matters after hours.So, I will now look into the first step of how the CMS knows if a content release workflow is running, and if it is, how the status of the run is reported. If no build is running and it is outside of the continuous release hours (or really just some canUserTriggerBuild
logic), then the user can hit the button to trigger a build.
I'm not even sure using the va_gov_github
module will be useful since I see it wrapping Github\Client
with a bunch of abstraction but no extra features that I can tell...at least it will keep the complexity lower to keep that part out for now. I will use the form on "/admin/content/deploy/simple" to test getting GH Worfklows info using Github\Client
.
and I'm guessing this is the next-build workflow that should be referenced in any API calls: https://github.com/department-of-veterans-affairs/next-build/blob/main/.github/workflows/content-release.yml
I was thinking through how to have the content-release
workflow run on demand as well as on a schedule using as much of the GitHub API as possible, and the workflow_dispatch
+ enable/disable workflow endpoint seems to cover the use case.
Workflows can be enabled and disabled via an endpoint: https://docs.github.com/en/rest/actions/workflows?apiVersion=2022-11-28#disable-a-workflow This can be used to turn off continuous building by having a workflow that runs during the scheduled business hours enabled or disabled with an API call.
When set to enabled this workflow would call the content-release
workflow whenever the content-release workflow completes. You can do different things if the workflow completes or fails: https://docs.github.com/en/actions/using-workflows/events-that-trigger-workflows#running-a-workflow-based-on-the-conclusion-of-another-workflow
on:
workflow_run:
workflows: [Content Release]
types: [completed]
jobs:
on-success:
runs-on: ubuntu-latest
if: ${{ github.event.workflow_run.conclusion == 'success' }}
steps:
# Might need to use cURL but weird if you can't use the gh CLI in a Workflow...
- run: gh workflow run content-release.yml
on-failure:
runs-on: ubuntu-latest
if: ${{ github.event.workflow_run.conclusion == 'failure' }}
steps:
- run: ./log-stuff.sh
Another workflow could accept a workflow_dispatch
event and then call the content-release
workflow. This would be the on-demand after-hours deployment request.
All API calls to GitHub could originate in the CMS either from user input or from a background script. However, I only think it makes sense to have on-demand and disable/enable workflow calls coming from the Drupal CMS. The continuous building shouldn't require any communication from the CMS, at least I don't see why it would be necessary.
Since some of the "workflows calling workflows" code needs to have things on a main/default branch, I think I will use my personal repos to test this out as I've never attempted much more than very basic GH Workflows. Also, I'm not really testing the workflows, I'm simply trying to test how to call, check status, disable/enable, and re-use workflows. I'll still make a branch for the Drupal code, but it will start by making API calls to test repos I control so as to not disturb VA devs.
I updated the content status release form with details from the GH workflow. This is only targeting the production content release details, but the QA/testing/Tugboat content release uses a log to show what the status is. I think the QA/testing/Tugboat release code should be updated to use commit flags and thus remove all needs for the state machine and content release queues.
The status block shows:
If the time is outside of business hours, then the content release request form is disabled. Otherwise, someone can check the acknowledgment checkbox and submit the form to make a request to run the GH Worfklow.
That is the basic outline of the UI changes I'm proposing and have committed to code. I will now look into implementing the GH Workflow code I mentioned previously that will trigger the content release workflow.
I'm testing this out on a personal repo: https://github.com/alexfinnarn/moz/actions with two workflows: one to run something and the other to watch for when it completes and re-run the workflow. This should work, but I'm running into a token error.
Run gh workflow run content_release.yml
gh workflow run content_release.yml
shell: /usr/bin/bash -e {0}
env:
GH_TOKEN: ***
could not create workflow dispatch event: HTTP 403 Resource not accessible by integration
Angry threads about this:
I think this can be taken care of with a PAT, but it would make more sense for the CLI to just work. Some kind of concern from GH about recursive workflows or something...
Total BS. You should be able to continuously run within the workflow if there are restrictions with calling workflows from other workflows. I guess I will try using the GH token over REST endpoint now, but that could get me the same error as with the CLI command.
As I'm learning about GH Workflows and how to make them more dynamic and re-usable, here is an example of taking current code and making it more dynamic.
The current content-build content release workflow has a line about some debug boolean variable: https://github.com/department-of-veterans-affairs/content-build/blob/main/.github/workflows/content-release.yml#L22 To update this variable, you would have to edit via GH admin UI or via an API call.
However, you can also use workflow_call
and input variables to allow for the dispatch call to provide debug variable information like this:
on:
workflow_call:
inputs:
debug:
required: true
type: string
env:
ACTIONS_RUNNER_DEBUG: ${{ inputs.debug }}
I bet passing in variables as inputs to workflows could help in several places.
Based on the answer from https://stackoverflow.com/a/75250838 I was able to go to my settings and change the permissions to read and write instead of only read:
And this did allow for the flow: Content Release -> Continuous Release -> Content Release
However, it stopped after triggering the content release once. On https://docs.github.com/en/actions/using-workflows/events-that-trigger-workflows#workflow_run there is a note:
You can't use workflow_run to chain together more than three levels of workflows. For example, if you attempt to trigger five workflows (named B to F) to run sequentially after an initial workflow A has run (that is: A → B → C → D → E → F), workflows E and F will not be run.
I only count two chains in my example, but solely using GH Worfklows to trigger other workflows in a loop might be impossible.
So at this point, it's probably smartest to use a schedule
that checks every five minutes since that is the shortest interval of time allowed. Check if it is during business hours and if a content release workflow is running. If not, then call the content release workflow.
I figured out how to get a workflow to run continuously solely on GH via Worfklows. I will post the complete workflow file since the workflow will never live in this CMS repo and it would need to be added to next-build. I tested this on a personal repo.
name: Continuous Release
on:
workflow_dispatch:
schedule:
# Run every five minutes.
- cron: '*/5 * * * *'
jobs:
trigger_release:
env:
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
runs-on: ubuntu-latest
steps:
- name: Checkout code
uses: actions/checkout@v2
with:
persist-credentials: false
- name: During business hours?
# Check to see if it is a weekday between 8am and 8pm in "America/New_York" timezone.
# Times taken from RunsDuringBusinessHours.php::isCurrentlyDuringBusinessHours()
run: |
export TZ="America/New_York"
echo "Current time: $(date)"
if [ $(date +%u) -lt 6 ] && [ $(date +%H) -ge 8 ] && [ $(date +%H) -lt 20 ]; then
echo "It is during business hours."
echo "BUSINESS_HOURS=true" >> $GITHUB_ENV
else
echo "It is not during business hours."
echo "BUSINESS_HOURS=false" >> $GITHUB_ENV
fi
- name: Content Release running?
run: |
RUNNING_WORKFLOWS=$(gh run list --workflow "Content Release" --json status --jq '.[] | select(.status == "in_progress")')
if [ -n "$RUNNING_WORKFLOWS" ]; then
echo "Content Release is already running."
echo "RELEASE_WORKFLOW_RUNNING=true" >> $GITHUB_ENV
else
echo "Content Release is not running."
echo "RELEASE_WORKFLOW_RUNNING=false" >> $GITHUB_ENV
fi
- name: Run Content Release
if: env.BUSINESS_HOURS == 'true' && env.RELEASE_WORKFLOW_RUNNING == 'false'
run: |
gh workflow run content_release.yml
The content_release.yml
workflow is within the same repo so it can easily be run with the GH token. One check looks to see if the time is during business hours, and the other check determines if the content build is already running. If it is within business hours and there is no current build running, the content release/build workflow gets kicked off.
The GH runners don't operate exactly every five minutes but seemed to run within 10 minutes. So, this isn't entirely continuous per se, but it does keep the content release going during business hours without needing the CMS to function. Granted, the content build can't succeed without the CMS so it is a moot point, but at least the sample code provides an option to do things this way.
I will now look into doing the same thing via a script like the queue_runner.sh
that runs continuously on Tugboat. Theoretically, a script could run the same code as in the workflow file I pasted and then make an API call to dispatch a content build workflow run.
The benefit is checking more often for when the content build is not running to trigger it. This code can go in this repo in the branch I am adding sample code to.
Description
When certain content types are saved in the CMS, they trigger a Content Build content release, regardless of the time of day or day of the week. This is to allow certain timely content to reach va.gov quickly outside of the normal content release schedule.
Next Build should also follow this pattern so that timely content updates are posted quickly.
Details
This functionality is handled by the CMS Drupal module va_gov_content_release: https://github.com/department-of-veterans-affairs/va.gov-cms/tree/main/docroot/modules/custom/va_gov_content_release
This will need to be augmented to handle Next Build content release in addition to Content Build.
Requirements
We want the CMS to manage triggering of Next Build content releases, so that the mechanism matches Content Build. ```[tasklist] ### Acceptance criteria - [ ] https://github.com/department-of-veterans-affairs/va.gov-cms/issues/17022 - [ ] How can we verify the requirements have been met factually/quantifiably? ``` ## Background & implementation detailsThe current Content Build implementation of this is managed by these custom Drupal modules:
The rough outline of what happens is:
ready
requested
and an API request is made to Github to start the workflow.in progress
.ready
, which triggers the process again.The implementation of this circular release management should be replicated or extended so that Next Build Content Releases are managed independently of Content Build Content Releases.
Proposed resolution by @alexfinnarn:
Background
While reviewing the current communication between the CMS and content-build GH workflow, I determined that the calls back to Drupal from GitHub are unnecessary. Also, the queue and state machine in Drupal is not needed. Coupled with the work done in https://github.com/department-of-veterans-affairs/va.gov-cms/issues/17209, this means that the queue and state machine code can be removed entirely simplifying the codebase and making it easier to maintain.
Instead of a back and forth between the GH workflow building content and Drupal, the communication can be one-way from Drupal to GH. The continuous build can be triggered from Drupal or GH, and the "out-of-band releases" can be triggered from a Drupal form.
Pros:
Cons:
How will this work?
Continuous Build
Script code: https://github.com/department-of-veterans-affairs/va.gov-cms/pull/17991/files#diff-25983c2c72210bfdda46b5543864aa3bf89c459a52f2c93d98240e70a0c1d93f
RunsDuringBusinessHours::isCurrentlyDuringBusinessHours()
.Note: The process can live entirely on GH in a workflow as an alternative to Drupal making API calls. See https://github.com/department-of-veterans-affairs/va.gov-cms/issues/16851#issuecomment-2077804382 for how that works.
From docs:
It might be useful to add a specific GH workflow for these errors rather than solely relying on the CMS to trigger things. In my tests, the continuous GH content build workflow kicks off within 10 minutes of the last one completing.
OOB release
Form code: https://github.com/department-of-veterans-affairs/va.gov-cms/pull/17991/files#diff-3d3bee748cdbebe5f5a0a8256fb4dcdfa5e486d77d64bf85c6cd7256e4649e68
/admin/content/deploy/check-status
(or a better sounding route) to check the status of builds.As far as I know, this covers the current functionality in code and in the docs. The content release workflow should also have the ability to be
workflow_dispatch
called within GH for any dev, but that is already in the next-build workflow.Notes about current docs
That is from the
cms-content-release.md
readme, but "Automatically when some types of content are edited" never really happens. There's always an item in the content-release queue from the continuous build making any queue item for an individual node pointless...plus, it looks like the...Ugg, I'm not following this code around anymore...you spin me round, round, baby, wrong round...The docs are outdated for the
EntityEventSubscriber
function, and I stopped after findingContentReleaseTriggerTrait
in theva_gov_content_types
module. With several modules dedicated to the content release, why put code in that module? So there could be some code that does trigger an OOB release, but good luck tracing through the code to figure it out.Remaining Questions?
cms-content-release.md
readme be updated? It seems out of date mentioning a class that has been refactored and moved to a different module.content-release.yml
workflow and the other workflow runs checks and calls the release workflow. GH allows for checks every five minutes but realistically this takes 5 - 10 minutes for the runners to kick off. API calls or manual intervention can enable/disable the workflows.$settings['va_gov_frontend_build_type']
and$settings['github_actions_deploy_env']
.