Closed k-macmillan closed 1 year ago
Hey team! Please add your planning poker estimate with Zenhub @cris-oddball @EvanParish @ianperera @jakehova @k-macmillan @kalbfled @ldraney
I commented out run-twistlock from dev_deploy.yml I created a new action called "Twistlock Cron Job" I asked Cris if she still wants Twistlock to run when a tag is created.
there are a lot of branches, what if we just ran twistlock against all open and draft prs?
Kyle, Cris, and I decided to run a scan on all images from latest commits on all open and draft PRs.
I can do that, but my only blocker is how to have twistlock run those new images in twistlock:
twistcli images scan --project vanotify \
--address https://twistlock.devops.va.gov \
--user vanotify-ci-user "$ECR_REGISTRY/$IMAGE_REPOSITORY:$IMAGE_TAG" \
STATUS_CODE=$?
I've written most of the twistlock cronjob action
But figuring out how to build the ci/Dockerfile for each PR branch is my main difficulty right now.
I believe I'll write a python script that takes in env inputs, then runs twistlock for every open and draft PR, and then the action will simply call this script from the master branch.
That means I should be able to build a script and run successfully locally.
Please check out my PR, you will see I have been running actions with my new script. Right now it is with dispatch and not a cronjob.
What's actually on our Twistlock Ec2 instance? Can I just use the twistlock cli from the action, or is there a specific configuration I need to be aware of? Because currently I don't need to use ECR, so there's not reason to be using the EC2.
Talking with Kyle about running twistlock in parrallel with dev_deploy.yml, I don't see the need to make this a cronjob as that could still turn out to be a blocker in the long run.
@ldraney please do @ us when you need info. The Twistlock EC2 instance contains the client code that runs the scan. I suggest that if we want to change how we run this, we open a new ticket. We know the code runs as it stands but changing it to run in GitHub actions is an unknown. I wrote up everything I know about how this works in this document.
Also, I don't see your PR that you mentioned in an earlier comment?
Yeah I want to trigger a new action on PR, separate from our others. My only thought is Cris suggested a server start/stop, so if I include something in that for this triggered action, it's NBD if the action takes longer.
So, pushing to ecr with the right tag and adding and aws command to start/stop the ec2 instance, and this ticket will be moving towards closed.
There are a few more things that Lucas will think through with the team, and he's optimistic. :)
The most difficult aspect of this ticket has been architecting, just reducing the available options and handling the various concepts/tools involved in accomplishing this task; but I think I've got a solution.
I need to build images with the correct tag, push them to ECR, and then call the ./.github/actions/run-commands-on-ec2 action to run the Twistlock scan inside an AWS EC2 instance, as it is designed to handle running commands on a remote instance.
I currently wrote a python script that builds a docker image per each PR. I still need to make sure it gets the tags right. Once I do that, I need to modify the Python script to include a call to the run-commands-on-ec2 action as a step in the workflow. This would involve setting the appropriate environment variables for the ECR_REGISTRY, IMAGE_REPOSITORY, and IMAGE_TAG, as well as the SSM parameter path for the EC2 instance ID.
If we go this route, then the cronjob would simply call my python script, and then my script would build a docker image and then run the twistlock action (which uses the necessary run-commands-on-ec2 action).
for reference, the current twistlock is being called in this manner from the twistlock.yml:
- name: Perform twistlock scan
env:
ECR_REGISTRY: ${{ steps.login-ecr-vaec.outputs.registry }}
IMAGE_REPOSITORY: "notification_api"
IMAGE_TAG: "${{ inputs.ref }}_${{ inputs.environment }}"
uses: ./.github/actions/run-commands-on-ec2
with:
instance-id-ssm-parameter-path: /utility/twistlock/instance
commands: >-
"aws --region us-gov-west-1 ecr get-login-password | docker login --username AWS --password-stdin '"$ECR_REGISTRY"' 2>/dev/null",
"docker pull '"$ECR_REGISTRY"'/'"$IMAGE_REPOSITORY"':'"$IMAGE_TAG"' 1>/dev/null",
"docker logout '"$ECR_REGISTRY"'",
"export TWISTLOCK_PASSWORD=$(aws --region us-gov-west-1 ssm get-parameter --name /utility/twistlock/vanotify-ci-user-password --with-decryption | jq '.Parameter.Value' -r)",
"twistcli images scan --project vanotify --address https://twistlock.devops.va.gov --user vanotify-ci-user '"$ECR_REGISTRY"'/'"$IMAGE_REPOSITORY"':'"$IMAGE_TAG"'",
"STATUS_CODE=$?",
"docker image prune -a -f 1>/dev/null",
"exit $STATUS_CODE"
I've pushed my new script, but for some reason it won't run; the error is:
could not find any workflows named Build Image Run Twistlock
AND
HTTP 404: Not Found (https://api.github.com/repos/department-of-veterans-affairs/notification-api/actions/workflows/build-image-run-twistlock.yml)
which I've faced before, but for some reason my normal method isn't working:
gh workflow run -r 974-twistlock-cronjob -F ref=974-twistlock-cronjob build-image-run-twistlock.yml
OR
gh workflow run -r 974-twistlock-cronjob -F ref=974-twistlock-cronjob 'Build Image Run Twistlock'
I spent way too much time architecting a solution to what is really a simple problem.
We need Twistlock to stop being a blocker and instead just scan for vulnerabilities when it needs to.
So I changed up the pipeline and want this ticket to be closed, and we need to talk about scanning additional images as a separate future ticket.
Again, the goal is to scan every deployed image with twistlock, and if twistlock breaks, it shouldn't slow down development -- instead, I should just be notified, or we should have an action to restart the ec2 server if Twistlock breaks. I've spent too much time on this for this sprint anyway, so I'm closing this ticket, since the original goal was accomplished.
@ldraney I don't think you can close this ticket until Twistlock is running on a schedule. Or did I miss that it is running on a schedule?
@cris-oddball
We may put it on a schedule still.
After talking with Kyle, the goal is to remove twistlock as a blocker and instead (as a proposed solution) run twistlock on all images that were created by teh dev_deploy workflow the day before. I proposed a better solution (IMO) that runs twistlock parallel to dev deploy; that way images are just tested outside the dev_deploy workflow, but we don't need to query AWS ECR for the latest images. It keeps thing simpler and removes a potential inconvenience of a cronjob.
name: Twistlock Scan
on:
workflow_run:
workflows: ["build.yml"]
jobs:
twistlock-scan:
runs-on: ubuntu-latest
steps:
- name: Checkout code
uses: actions/checkout@v2
- name: Run Twistlock Scan
run: <command to run Twistlock scan>
@cris-oddball
Yep, I'm playing with it and Cronjob with an aws command that queries the latest images is the probably the easiest way, so I'll just do that.
this:
- name: Login to VAEC ECR
id: login-ecr-vaec
uses: aws-actions/amazon-ecr-login@v1
- name: Perform twistlock scan
env:
ECR_REGISTRY: ${{ steps.login-ecr-vaec.outputs.registry }}
TWISTLOCK_PASSWORD: ${{ secrets.TWISTLOCK_PASSWORD }}
IMAGE_REPOSITORY: "notification_api"
uses: ./.github/actions/run-commands-on-ec2
with:
instance-id-ssm-parameter-path: /utility/twistlock/instance
commands: >-
ls
#"aws --region us-gov-west-1 ecr get-login-password | docker login --username AWS --password-stdin '"$ECR_REGISTRY"' 2>/dev/null",
#"export TWISTLOCK_PASSWORD=$(aws --region us-gov-west-1 ssm get-parameter --name /utility/twistlock/vanotify-ci-user-password --with-decryption | jq '.Parameter.Value' -r)",
#for IMAGE_TAG in ${{ steps.get-image-tags.outputs.images }}; do
#docker pull "$ECR_REGISTRY/$IMAGE_REPOSITORY:$IMAGE_TAG" 1>/dev/null
#twistcli images scan --project vanotify --address https://twistlock.devops.va.gov --user vanotify-ci-user "$ECR_REGISTRY/$IMAGE_REPOSITORY:$IMAGE_TAG"
#STATUS_CODE=$?
#if [ $STATUS_CODE -ne 0 ]; then
#exit $STATUS_CODE
#fi
#docker image prune -a -f 1>/dev/null
#done
#"docker logout '"$ECR_REGISTRY"'",
Is failing with this:
Error parsing parameter '--parameters': Invalid JSON: Expecting value: line 3 column 7 (char 41)
JSON received: {"commands":[
"#!/bin/bash",
ls #"aws --region us-gov-west-1 ecr get-login*** docker login --username AWS --password-stdin ***.dkr.ecr.us-gov-west-1.amazonaws.com 2>/dev/null",
#"export TWISTLOCK_PASSWORD=$(aws --region us-gov-west-1 ssm get-parameter --name /utility/twistlock/vanotify-ci-user*** | jq .Parameter.Value -r)", #for IMAGE_TAG in v1.5.6-1076-db-column_perf; do
#docker pull "$ECR_REGISTRY/$IMAGE_REPOSITORY:$IMAGE_TAG" 1>/dev/null
#twistcli images scan --project vanotify --address https://twistlock.devops.va.gov --user vanotify-ci-user "$ECR_REGISTRY/$IMAGE_REPOSITORY:$IMAGE_TAG"
#STATUS_CODE=$?
#if [ $STATUS_CODE -ne 0 ]; then
#exit $STATUS_CODE
#fi
#docker image prune -a -f 1>/dev/null
#done #"docker logout ***.dkr.ecr.us-gov-west-1.amazonaws.com",
]}
I'm getting a 252 error every time... the error is coming from the ec2.. I'll probably need to test these commands from my CLI
@ldraney that's a parsing error, the JSON is invalid. Not necessarily EC2 or twistlock, but all JSON has to be parsed when used and that validation is failing.
/home/runner/work/_temp/82cad616-abb8-4726-8959-f0565194fedd.sh: line 9: untagged: No such file or directory Error: Process completed with exit code 1.
I'm getting that error when I'm trying to get rid of the literal '
I'm still just figuring out how to mess with a bash script that's forced to be in json fromat, e.g. adding a loop to this for our purposes:
"aws --region us-gov-west-1 ecr get-login-password | docker login --username AWS --password-stdin '"$ECR_REGISTRY"' 2>/dev/null",
"docker pull '"$ECR_REGISTRY"'/'"$IMAGE_REPOSITORY"':'"$IMAGE_TAG"' 1>/dev/null",
"docker logout '"$ECR_REGISTRY"'",
"export TWISTLOCK_PASSWORD=$(aws --region us-gov-west-1 ssm get-parameter --name /utility/twistlock/vanotify-ci-user-password --with-decryption | jq '.Parameter.Value' -r)",
"twistcli images scan --project vanotify --address https://twistlock.devops.va.gov --user vanotify-ci-user '"$ECR_REGISTRY"'/'"$IMAGE_REPOSITORY"':'"$IMAGE_TAG"'",
"STATUS_CODE=$?",
"docker image prune -a -f 1>/dev/null",
"exit $STATUS_CODE"
I think the following script is close to accomplishing our purposes:
"aws --region us-gov-west-1 ecr get-login-password |
docker login --username AWS --password-stdin '$ECR_REGISTRY' 2>/dev/null",
"for IMAGE_TAG in ${{ steps.get-image-tags.outputs.images }}; do
if [[ $IMAGE_TAG == *_dev* ]]; then
docker pull $ECR_REGISTRY/$IMAGE_REPOSITORY:$IMAGE_TAG 1>/dev/null || exit;
twistcli images scan --project vanotify --address https://twistlock.devops.va.gov --user vanotify-ci-user $ECR_REGISTRY/$IMAGE_REPOSITORY:$IMAGE_TAG;
STATUS_CODE=$?;
if [ $STATUS_CODE -ne 0 ]; then
exit $STATUS_CODE;
fi;
docker image prune -a -f 1>/dev/null;
fi
done;
docker logout '$ECR_REGISTRY';"
I think our for loop has to be in one line, makes it a pain to work with.
I'm going to try and make it a bash script
oh lol, you know why my script is probably failing? Because no images have been pushed in the last 24 hours.
So, the bash script needs to be adjust somehow to not fail if there are no tags that are returned.
Or, to adjust the action so the "Perform twistlock scan' only runs if the 'Query ECR for images' step doesn't return an empty variable. Maybe this can be done by having a conditional on the 'Perform twistlock scan' that only runs if "images" is not empty.
Lucas working on GitHub Actions nuances, trying to run various images successfully, a couple of details to still work out.
Kyle's direction is to scope this to the last container.
https://github.com/orgs/community/discussions/24952
This may be why my variables aren't working in the actions
Finally figured it out based on the above documentation:
echo 'IMAGES<<EOF' >> $GITHUB_ENV
echo $IMAGES >> $GITHUB_ENV
echo 'EOF' >> $GITHUB_ENV
Why it was so hard to have a multi-line variable in Github Actions is beyond me.
"aws --region us-gov-west-1 ecr get-login-password | docker login --username AWS --password-stdin '$ECR_REGISTRY' 2>/dev/null; for IMAGE_TAG in $IMAGES; do docker pull \"$ECR_REGISTRY/$IMAGE_REPOSITORY:$IMAGE_TAG\" 1>/dev/null; twistcli images scan --project vanotify --address https://twistlock.devops.va.gov --user vanotify-ci-user \"$ECR_REGISTRY/$IMAGE_REPOSITORY:$IMAGE_TAG\"; STATUS_CODE=$?; if [ $STATUS_CODE -ne 0 ]; then exit $STATUS_CODE; else echo \"{ \\\"status\\\": \\\"success\\\", \\\"image\\\": \\\"$IMAGE_TAG\\\" }\"; fi; docker image prune -a -f 1>/dev/null; done; docker logout '$ECR_REGISTRY';"
That is my attempt at getting this to echo successfully twistlocking an image, but for some reason it won't echo success, which means something must be going wrong...
Here is what the script would look like if it wasn't crammed into a single line:
#!/bin/bash
# Get ECR login token and login to ECR
aws --region us-gov-west-1 ecr get-login-password | \
docker login --username AWS --password-stdin "$ECR_REGISTRY" 2>/dev/null
# Iterate over each image tag and perform Twistlock scan
for IMAGE_TAG in $IMAGES; do
# Pull the image from ECR
docker pull "$ECR_REGISTRY/$IMAGE_REPOSITORY:$IMAGE_TAG" 1>/dev/null
# Scan the image using Twistlock
twistcli images scan --project vanotify --address https://twistlock.devops.va.gov \
--user vanotify-ci-user "$ECR_REGISTRY/$IMAGE_REPOSITORY:$IMAGE_TAG"
# Check the exit status of the Twistlock command
STATUS_CODE=$?
if [ $STATUS_CODE -ne 0 ]; then
# If the Twistlock command failed, exit the script with the same status code
exit $STATUS_CODE
else
# If the Twistlock command succeeded, echo a success message in JSON format
echo "{ \"status\": \"success\", \"image\": \"$IMAGE_TAG\" }"
fi
# Remove all unused Docker images from the host machine
done
# Logout from ECR
docker logout "$ECR_REGISTRY"
docker image prune -a -f 1>/dev/null
I'm now going to go through piece by piece to see what is producign "success" and to see why it won't run my loop.
@k-macmillan Could you please see above and share your thoughts with Lucas?
@ldraney Please remember to talk with Kyle about 1) deciding which option to proceed with based on current findings and 2) a potential refactoring/tech debt ticket for future consideration.
I gave Lucas some guidance on this in a huddle. He's going to use that to make changes.
I think I finally found a maintainable solution:
In essence, build.yml will now call Twistlock with gh cli command, and Twistlock is only callable by workflow_dispatch; thus, everytime dev_deploy is run, build.yml will call twistlock as a separate workflow, distinct from deployment. This is a solution on multiple fronts:
Value Statement
As VA Notify I want to consistently check for vulnerabilities So that vulnerabilities do not go unchecked
Perform nightly Twistlock scanning of master as well as the 5? most recently commit branches to ensure in-progress work is being checked when able. The idea here is that very few dependencies are added and approved in a day. At most, vulnerabilities will be detected in less than 24 hours.
In addition to nightly scans, post scan failures to the DSVA automated alerts channel (we don't really need to see successes).
Checklist
Additional Info/Resources
Github page describing cron actions.
Fetch all remote branches:
Get commit hashes of 5 most recent commits:
git for-each-ref --sort=-committerdate refs/heads/ | head -n 5 | awk '{print $1}'