Enhance GitHub Actions Deployment Pipeline with Manual Approval - Githubissues

department-of-veterans-affairs / notification-api

Notification API

MIT License

16 stars 9 forks source link

Enhance GitHub Actions Deployment Pipeline with Manual Approval #1668

Closed ldraney closed 7 months ago

ldraney commented 8 months ago

User Story - Business Need

[x] Ticket is understood, and QA has been contacted (if the ticket has a QA label).

User Story(ies)

As a developer/team member involved in the deployment processes I want to have a deployment pipeline in GitHub Actions that includes a manual intervention step for performance environment deployments So that we can ensure deployments are approved by designated team members, maintaining control and quality before the release.

Additional Info and Resources

Implement manual intervention in our GitHub Actions deployment pipeline, requiring specific approval before proceeding with deployment to the performance environment. Update the GitHub environment configuration and deployment protection rules accordingly.

See this video See Using Environments for Deployment.
See Portal's pipline

Engineering Checklist

[x] Update GitHub Actions workflow to include a manual intervention step.
[x] Configure GitHub environment and protection rules are updated to reflect new approval process.
[x] Document the updated deployment process and manual intervention steps here.
[x] Demo new process to the team to ensure there's understanding

Acceptance Criteria

[x] Scenario 1: Ticket is ready for QA

GIVEN a developer's PR is approved and ready for QA WHEN they merge their code to master, that SHA will be used for a potential deploy and merge to the release branch THEN Github will pause the workflow with an approval request to QA
[x] Scenario 2: QA Approves request because they are ready to test in Perf

GIVEN QA approves a deploy to perf THEN After successful deploy and testing THEN QA comes back to workflow to approve a merge of that master's SHA into the release branch
[x] Scenario 3: QA cancels request because Dev is not ready/found an issue

GIVEN QA reviews an approval request in Github WHEN QA rejects the request THEN the request would be cleared from Github and dev would need a new PR/hotfix to update Master. Upon merge this reinitiates the above flows.

QA Considerations

Given that this directly impacts the QA workflow, QA should be involved in testing these workflows.

Out of Scope

Additional functionality around notifying approver like Slack
2nd QA approval of approving code and moving to Staging
- Identifying what to do with cherry pick if QA rejects

cris-oddball commented 7 months ago

@ldraney specifically, where in the workflow does this require QA approval?

mjones-oddball commented 7 months ago

Hey team! Please add your planning poker estimate with Zenhub @EvanParish @k-macmillan @kalbfled @ldraney @MackHalliday @mchlwellman

ldraney commented 7 months ago

I have a good draft POC that I'll be showing to QA soon:

ldraney commented 7 months ago

Instructions for Kyle do adjust our repo will likely be:

ldraney commented 7 months ago

I updated the AC on this ticket to reflect team discussions:

AC change

Workflow says, “Hey QA, merge to master, do you want to deploy to perf?”
QA tests the latest master (cherrypicking doesn't seem necessary, IMO)
It waits for her approval with “how did that testing go?”
Approval means the commit goes to release branch

Value of AC change

Cris now has valuable rejections
Nothing in release branch that has not passed QAl

ldraney commented 7 months ago

I've been thinking about scenario 2 of the AC and had a good convo with Kyle about it. Merging should be sufficient for now, and as needs arise we can adjust as is necessary; I was worried about when Cris rejects merging to the release branch; but she can just cancel the sequential runs, then Cris waits until master is in a state to merge to release the branch.

So, I'll work on getting scenario 2 setup, and I'll try to have something ready to talk about with Cris tomorrow morning.

ldraney commented 7 months ago

Just passed this to Cris to see her thoughts; She says it sounds great, so I'm going to begin the implementation in my POC:

Hey Cris,

After some thinking and latest convo with Kyle, I wanted to pass you my outline. I’ll be setting up the POC here next, but I wanted to let you know where I’m at before I begin that work. Hopefully you don’t see any red flags, but if you do, LMK; and happy to huddle.

We are going to utilize merges for now and avoid potential complexities with cherry picking. Merging should be more than sufficient moving forward for the time being:

Next steps Add to the deploy-to-perf.yml It makes a branch and takes current master to that branch to test in perf It waits for Cris to come back to the pipeline and approve after she is done testing If she approves, the workflow then automatically merges to the release branch If she rejects Workflow cancels and that test branch is deleted If there are other workflows with branches ready, she can test those if she wants, and can likely merge unless there is a dependency, but that shouldn’t happen often and we should work out a solution if that does become a more frequent issue Know what branch protections for the release branch are needed

ldraney commented 7 months ago

Cris earlier pointed out something I will likely need to address: Her current workflows allow the tag to push to perf; implementing environment protections could disrupt her flows, requiring permissions for those workflows as well. I'm not sure what the workaround is for this yet; but I don't think this is a blocker, but will require adjustments down the line, so I'm going to move forward with the AC of this ticket and then get there when we get there

ldraney commented 7 months ago

In standup we agreed that this should be a non-breaking change. So, we may call this environment something like "test-perf" until this is out of our early implementation phase.
That shouldn't slow down this ticket!

ldraney commented 7 months ago

Just developing along! I hope to have my POC ready and my PR open tomorrow morning.

ldraney commented 7 months ago

I got this much working fairly well:

But I discovered that having double-permissions in a single workflow doesn't always work. There is an occasional error message that says There was a problem approving one of the gates., and other times the workflow just approves BOTH steps.

An easy workaround would be having ANOTHER environment protection environment name like "perf-post-test", which i may just do. Yeah, I'll do that for now.

ldraney commented 7 months ago

I’ve made good progress but making the temporary branch and then merging to release has some nuances within the Action Runner that I’m addressing.

Other TODO Add workflow call to dev_deploy and make sure it works for build job and other jobs Confirm with Kyle that before build step is right Talk to kyle about making the release branch

Notes Q: How does Cris know that she has to approve something? A: I suggest the PR channel’s announcement that something has merged to master should be the indicator. We could make it standard to tag QA in those posts; Also, github already emails on a merge to the default branch, so QA will be able to watch for those Github emails/slack and act appropriately.

ldraney commented 7 months ago

Most successful run (but the release branch wasn't updated so I gotta figure out why):

ldraney commented 7 months ago

I may move this ticket out of in-progress until I figure out why our VeText failed monitor didn't send a notification to the appropriate Octo slack channel; But I'm seeing if there is an easy fix for it first.

ldraney commented 7 months ago

I've added the "Off Track" label to the ticket because during its progress, the acceptance criteria required updating, which is fine because this ticket required a bit of conversation to ensure we get this initial setup right.

Also, I discovered this is a more creative solution than anticipated, considering how the jobs/steps and environments all work together in the workflow for our purpose; along these lines, bugs failing Git commands within the runner space have proved to be a time-consuming aspects that required additional exploration.

ldraney commented 7 months ago

I'm moving this back to TODO because I need to fix my admin access to AWS

ldraney commented 7 months ago

Taking this back up. Please allow me some time to digest the moving parts and POC repos; though I remember the latest blocker is getting these git commands to work within the runner. I remember maybe using something in the QA repo to overcome this using the github-script action, but hopefully thats unnecessary cause bash is better for stuff like this; but the Octokit authentication is nice, and Javascript is readable and fast enough for this situation; but hopefully I don't have to mess with that at all.

ldraney commented 7 months ago

Here's my POC so far

To compare how complexity increased, here's the current state of my PR; will try to update soon.

ldraney commented 7 months ago

I'm unable to merge to the release branch with the latest error being:

Run git merge -X ff82a6bc2ffa3d4bbade27b3eaa357f7ba364eff --no-ff -m "Merge approved changes from main ${GITHUB_SHA} to release"
Already up to date.
remote: Permission to dialectic-devops/demo-api.git denied to github-actions[bot].
fatal: unable to access 'https://github.com/dialectic-devops/demo-api/': The requested URL returned error: 403
Error: Process completed with exit code 128.

And what is strange is I'm unable to give the actions permissions to write:

I'm reading the linked documentation to see if I can figure out the solution

ldraney commented 7 months ago

This documentation suggests that adding this to the relevant job will work:

    permissions:
      contents: write

ldraney commented 7 months ago

Successful POC with the different environment names.
The first environment perf-deploy is so QA doesn't have work disrupted, such as on a tag push to perf workflow. The second environment perf-merge is due to trouble using the same environment twice in the same workflow.

ldraney commented 7 months ago

Successful POC in the notification-api repo

First I'll ask Kyle to set up the perf-deploy and perf-merge environments to our repo with branch protections for QA and myself for testing. I'll also need a full run through with a deploy to perf to ensure its working as expected. Then I'll be able to have a POC for QA to look at before opening the PR. Before I open the PR I'll make the adjustments so that trigger is for master not my branch, and merge is to release rather than 1668-pipeline-approval-release.

ldraney commented 7 months ago

Successful run with environment protections

About to try a full run deploy to perf after talking with Cris

ldraney commented 7 months ago

Just noting here the reason we needed a second new environment name is because by approving the first one it would automatically approve each check! Essentially an environment is needed per approval step in our workflow.

ldraney commented 7 months ago

Working workflow! Things looking good.

I'm reaching out to QA before opening my PR.

ldraney commented 7 months ago

Rephrased the AC to better reflect what is actually happening

cris-oddball commented 7 months ago

@ldraney Please do remember to update the documentation before closing this ticket.

ldraney commented 7 months ago

Successful workflow! merged to master

I need to update documentation and then wallah!

ldraney commented 7 months ago

Here's my first iteration of the documentation.

I've also added it to the glossary