department-of-veterans-affairs / notification-api

Notification API
MIT License
16 stars 9 forks source link

Twistlock Moved to Scheduled Task #974

Closed k-macmillan closed 1 year ago

k-macmillan commented 1 year ago

Value Statement

As VA Notify I want to consistently check for vulnerabilities So that vulnerabilities do not go unchecked

Perform nightly Twistlock scanning of master as well as the 5? most recently commit branches to ensure in-progress work is being checked when able. The idea here is that very few dependencies are added and approved in a day. At most, vulnerabilities will be detected in less than 24 hours.

In addition to nightly scans, post scan failures to the DSVA automated alerts channel (we don't really need to see successes).

Checklist

Additional Info/Resources

Github page describing cron actions.

Fetch all remote branches:

for i in `git branch -a | grep remote | grep -v HEAD | grep -v master`; do git branch --track ${i#remotes/origin/} $i; done

Get commit hashes of 5 most recent commits: git for-each-ref --sort=-committerdate refs/heads/ | head -n 5 | awk '{print $1}'

mjones-oddball commented 1 year ago

Hey team! Please add your planning poker estimate with Zenhub @cris-oddball @EvanParish @ianperera @jakehova @k-macmillan @kalbfled @ldraney

ldraney commented 1 year ago

I commented out run-twistlock from dev_deploy.yml I created a new action called "Twistlock Cron Job" I asked Cris if she still wants Twistlock to run when a tag is created.

ldraney commented 1 year ago
ldraney commented 1 year ago

there are a lot of branches, what if we just ran twistlock against all open and draft prs?

ldraney commented 1 year ago

Kyle, Cris, and I decided to run a scan on all images from latest commits on all open and draft PRs.
I can do that, but my only blocker is how to have twistlock run those new images in twistlock:

twistcli images scan --project vanotify \
--address https://twistlock.devops.va.gov \
--user vanotify-ci-user "$ECR_REGISTRY/$IMAGE_REPOSITORY:$IMAGE_TAG" \
          STATUS_CODE=$?
ldraney commented 1 year ago

I've written most of the twistlock cronjob action

But figuring out how to build the ci/Dockerfile for each PR branch is my main difficulty right now.

I believe I'll write a python script that takes in env inputs, then runs twistlock for every open and draft PR, and then the action will simply call this script from the master branch.

That means I should be able to build a script and run successfully locally.

ldraney commented 1 year ago

Please check out my PR, you will see I have been running actions with my new script. Right now it is with dispatch and not a cronjob.

ldraney commented 1 year ago

What's actually on our Twistlock Ec2 instance? Can I just use the twistlock cli from the action, or is there a specific configuration I need to be aware of? Because currently I don't need to use ECR, so there's not reason to be using the EC2.

ldraney commented 1 year ago

Talking with Kyle about running twistlock in parrallel with dev_deploy.yml, I don't see the need to make this a cronjob as that could still turn out to be a blocker in the long run.

cris-oddball commented 1 year ago

@ldraney please do @ us when you need info. The Twistlock EC2 instance contains the client code that runs the scan. I suggest that if we want to change how we run this, we open a new ticket. We know the code runs as it stands but changing it to run in GitHub actions is an unknown. I wrote up everything I know about how this works in this document.

Also, I don't see your PR that you mentioned in an earlier comment?

ldraney commented 1 year ago

Yeah I want to trigger a new action on PR, separate from our others. My only thought is Cris suggested a server start/stop, so if I include something in that for this triggered action, it's NBD if the action takes longer.

So, pushing to ecr with the right tag and adding and aws command to start/stop the ec2 instance, and this ticket will be moving towards closed.

tabinda-syed commented 1 year ago

There are a few more things that Lucas will think through with the team, and he's optimistic. :)

ldraney commented 1 year ago

The most difficult aspect of this ticket has been architecting, just reducing the available options and handling the various concepts/tools involved in accomplishing this task; but I think I've got a solution.

I need to build images with the correct tag, push them to ECR, and then call the ./.github/actions/run-commands-on-ec2 action to run the Twistlock scan inside an AWS EC2 instance, as it is designed to handle running commands on a remote instance.

I currently wrote a python script that builds a docker image per each PR. I still need to make sure it gets the tags right. Once I do that, I need to modify the Python script to include a call to the run-commands-on-ec2 action as a step in the workflow. This would involve setting the appropriate environment variables for the ECR_REGISTRY, IMAGE_REPOSITORY, and IMAGE_TAG, as well as the SSM parameter path for the EC2 instance ID.

If we go this route, then the cronjob would simply call my python script, and then my script would build a docker image and then run the twistlock action (which uses the necessary run-commands-on-ec2 action).

for reference, the current twistlock is being called in this manner from the twistlock.yml:

      - name: Perform twistlock scan
        env:
          ECR_REGISTRY: ${{ steps.login-ecr-vaec.outputs.registry }}
          IMAGE_REPOSITORY: "notification_api"
          IMAGE_TAG: "${{ inputs.ref }}_${{ inputs.environment }}"
        uses: ./.github/actions/run-commands-on-ec2
        with:
          instance-id-ssm-parameter-path: /utility/twistlock/instance
          commands: >-
            "aws --region us-gov-west-1 ecr get-login-password | docker login --username AWS --password-stdin '"$ECR_REGISTRY"' 2>/dev/null",
            "docker pull '"$ECR_REGISTRY"'/'"$IMAGE_REPOSITORY"':'"$IMAGE_TAG"' 1>/dev/null",
            "docker logout '"$ECR_REGISTRY"'",
            "export TWISTLOCK_PASSWORD=$(aws --region us-gov-west-1 ssm get-parameter --name /utility/twistlock/vanotify-ci-user-password --with-decryption | jq '.Parameter.Value' -r)",
            "twistcli images scan --project vanotify --address https://twistlock.devops.va.gov --user vanotify-ci-user '"$ECR_REGISTRY"'/'"$IMAGE_REPOSITORY"':'"$IMAGE_TAG"'",
            "STATUS_CODE=$?",
            "docker image prune -a -f 1>/dev/null",
            "exit $STATUS_CODE"
ldraney commented 1 year ago

I've pushed my new script, but for some reason it won't run; the error is:

could not find any workflows named Build Image Run Twistlock

AND

HTTP 404: Not Found (https://api.github.com/repos/department-of-veterans-affairs/notification-api/actions/workflows/build-image-run-twistlock.yml)

which I've faced before, but for some reason my normal method isn't working:

gh workflow run -r 974-twistlock-cronjob -F ref=974-twistlock-cronjob build-image-run-twistlock.yml

OR

gh workflow run -r 974-twistlock-cronjob -F ref=974-twistlock-cronjob 'Build Image Run Twistlock'
ldraney commented 1 year ago

I spent way too much time architecting a solution to what is really a simple problem.

We need Twistlock to stop being a blocker and instead just scan for vulnerabilities when it needs to.

So I changed up the pipeline and want this ticket to be closed, and we need to talk about scanning additional images as a separate future ticket.

ldraney commented 1 year ago

Again, the goal is to scan every deployed image with twistlock, and if twistlock breaks, it shouldn't slow down development -- instead, I should just be notified, or we should have an action to restart the ec2 server if Twistlock breaks. I've spent too much time on this for this sprint anyway, so I'm closing this ticket, since the original goal was accomplished.

image.png

cris-oddball commented 1 year ago

@ldraney I don't think you can close this ticket until Twistlock is running on a schedule. Or did I miss that it is running on a schedule?

ldraney commented 1 year ago

@cris-oddball
We may put it on a schedule still. After talking with Kyle, the goal is to remove twistlock as a blocker and instead (as a proposed solution) run twistlock on all images that were created by teh dev_deploy workflow the day before. I proposed a better solution (IMO) that runs twistlock parallel to dev deploy; that way images are just tested outside the dev_deploy workflow, but we don't need to query AWS ECR for the latest images. It keeps thing simpler and removes a potential inconvenience of a cronjob.

name: Twistlock Scan
on:
  workflow_run:
    workflows: ["build.yml"]
jobs:
  twistlock-scan:
    runs-on: ubuntu-latest
    steps:
      - name: Checkout code
        uses: actions/checkout@v2
      - name: Run Twistlock Scan
        run: <command to run Twistlock scan>
ldraney commented 1 year ago

@cris-oddball

Yep, I'm playing with it and Cronjob with an aws command that queries the latest images is the probably the easiest way, so I'll just do that.

ldraney commented 1 year ago

this:

      - name: Login to VAEC ECR
        id: login-ecr-vaec
        uses: aws-actions/amazon-ecr-login@v1

      - name: Perform twistlock scan
        env:
          ECR_REGISTRY: ${{ steps.login-ecr-vaec.outputs.registry }}
          TWISTLOCK_PASSWORD: ${{ secrets.TWISTLOCK_PASSWORD }}
          IMAGE_REPOSITORY: "notification_api"
        uses: ./.github/actions/run-commands-on-ec2
        with: 
          instance-id-ssm-parameter-path: /utility/twistlock/instance
          commands: >-
            ls
            #"aws --region us-gov-west-1 ecr get-login-password | docker login --username AWS --password-stdin '"$ECR_REGISTRY"' 2>/dev/null",

            #"export TWISTLOCK_PASSWORD=$(aws --region us-gov-west-1 ssm get-parameter --name /utility/twistlock/vanotify-ci-user-password --with-decryption | jq '.Parameter.Value' -r)",
            #for IMAGE_TAG in ${{ steps.get-image-tags.outputs.images }}; do
              #docker pull "$ECR_REGISTRY/$IMAGE_REPOSITORY:$IMAGE_TAG" 1>/dev/null
              #twistcli images scan --project vanotify --address https://twistlock.devops.va.gov --user vanotify-ci-user "$ECR_REGISTRY/$IMAGE_REPOSITORY:$IMAGE_TAG"
              #STATUS_CODE=$?
              #if [ $STATUS_CODE -ne 0 ]; then
                #exit $STATUS_CODE
              #fi
              #docker image prune -a -f 1>/dev/null
            #done
            #"docker logout '"$ECR_REGISTRY"'",

Is failing with this:

Error parsing parameter '--parameters': Invalid JSON: Expecting value: line 3 column 7 (char 41)
JSON received: {"commands":[
      "#!/bin/bash",
      ls #"aws --region us-gov-west-1 ecr get-login*** docker login --username AWS --password-stdin ***.dkr.ecr.us-gov-west-1.amazonaws.com 2>/dev/null",
#"export TWISTLOCK_PASSWORD=$(aws --region us-gov-west-1 ssm get-parameter --name /utility/twistlock/vanotify-ci-user*** | jq .Parameter.Value -r)", #for IMAGE_TAG in v1.5.6-1076-db-column_perf; do
  #docker pull "$ECR_REGISTRY/$IMAGE_REPOSITORY:$IMAGE_TAG" 1>/dev/null
  #twistcli images scan --project vanotify --address https://twistlock.devops.va.gov --user vanotify-ci-user "$ECR_REGISTRY/$IMAGE_REPOSITORY:$IMAGE_TAG"
  #STATUS_CODE=$?
  #if [ $STATUS_CODE -ne 0 ]; then
    #exit $STATUS_CODE
  #fi
  #docker image prune -a -f 1>/dev/null
#done #"docker logout ***.dkr.ecr.us-gov-west-1.amazonaws.com",
    ]}
ldraney commented 1 year ago

I'm getting a 252 error every time... the error is coming from the ec2.. I'll probably need to test these commands from my CLI

cris-oddball commented 1 year ago

@ldraney that's a parsing error, the JSON is invalid. Not necessarily EC2 or twistlock, but all JSON has to be parsed when used and that validation is failing.

ldraney commented 1 year ago

/home/runner/work/_temp/82cad616-abb8-4726-8959-f0565194fedd.sh: line 9: untagged: No such file or directory Error: Process completed with exit code 1.

I'm getting that error when I'm trying to get rid of the literal '' tag from running through Twistlock, so that twistlock won't fail for an image that wasn't pushed by our pipeline. (I don't even know where the images are coming from.)

ldraney commented 1 year ago

I'm still just figuring out how to mess with a bash script that's forced to be in json fromat, e.g. adding a loop to this for our purposes:

            "aws --region us-gov-west-1 ecr get-login-password | docker login --username AWS --password-stdin '"$ECR_REGISTRY"' 2>/dev/null",
            "docker pull '"$ECR_REGISTRY"'/'"$IMAGE_REPOSITORY"':'"$IMAGE_TAG"' 1>/dev/null",
            "docker logout '"$ECR_REGISTRY"'",
            "export TWISTLOCK_PASSWORD=$(aws --region us-gov-west-1 ssm get-parameter --name /utility/twistlock/vanotify-ci-user-password --with-decryption | jq '.Parameter.Value' -r)",
            "twistcli images scan --project vanotify --address https://twistlock.devops.va.gov --user vanotify-ci-user '"$ECR_REGISTRY"'/'"$IMAGE_REPOSITORY"':'"$IMAGE_TAG"'",
            "STATUS_CODE=$?",
            "docker image prune -a -f 1>/dev/null",
            "exit $STATUS_CODE"

I think the following script is close to accomplishing our purposes:

"aws --region us-gov-west-1 ecr get-login-password |
  docker login --username AWS --password-stdin '$ECR_REGISTRY' 2>/dev/null",
"for IMAGE_TAG in ${{ steps.get-image-tags.outputs.images }}; do
  if [[ $IMAGE_TAG == *_dev* ]]; then
    docker pull $ECR_REGISTRY/$IMAGE_REPOSITORY:$IMAGE_TAG 1>/dev/null || exit;
    twistcli images scan --project vanotify --address https://twistlock.devops.va.gov --user vanotify-ci-user $ECR_REGISTRY/$IMAGE_REPOSITORY:$IMAGE_TAG;
    STATUS_CODE=$?;
    if [ $STATUS_CODE -ne 0 ]; then
      exit $STATUS_CODE;
    fi;
    docker image prune -a -f 1>/dev/null;
  fi
done;
docker logout '$ECR_REGISTRY';"
ldraney commented 1 year ago

I think our for loop has to be in one line, makes it a pain to work with.

ldraney commented 1 year ago

I'm going to try and make it a bash script

ldraney commented 1 year ago

oh lol, you know why my script is probably failing? Because no images have been pushed in the last 24 hours.

So, the bash script needs to be adjust somehow to not fail if there are no tags that are returned.

Or, to adjust the action so the "Perform twistlock scan' only runs if the 'Query ECR for images' step doesn't return an empty variable. Maybe this can be done by having a conditional on the 'Perform twistlock scan' that only runs if "images" is not empty.

tabinda-syed commented 1 year ago

Lucas working on GitHub Actions nuances, trying to run various images successfully, a couple of details to still work out.

Kyle's direction is to scope this to the last container.

ldraney commented 1 year ago

https://github.com/orgs/community/discussions/24952

This may be why my variables aren't working in the actions

ldraney commented 1 year ago

And this one https://medium.com/@ibraheemabukaff/github-actions-exporting-multi-line-one-line-value-environment-variable-5bb86d01e866

ldraney commented 1 year ago

Finally figured it out based on the above documentation:

          echo 'IMAGES<<EOF' >> $GITHUB_ENV
          echo $IMAGES >> $GITHUB_ENV
          echo 'EOF' >> $GITHUB_ENV

Why it was so hard to have a multi-line variable in Github Actions is beyond me.

ldraney commented 1 year ago
            "aws --region us-gov-west-1 ecr get-login-password | docker login --username AWS --password-stdin '$ECR_REGISTRY' 2>/dev/null; for IMAGE_TAG in $IMAGES; do docker pull \"$ECR_REGISTRY/$IMAGE_REPOSITORY:$IMAGE_TAG\" 1>/dev/null; twistcli images scan --project vanotify --address https://twistlock.devops.va.gov --user vanotify-ci-user \"$ECR_REGISTRY/$IMAGE_REPOSITORY:$IMAGE_TAG\"; STATUS_CODE=$?; if [ $STATUS_CODE -ne 0 ]; then exit $STATUS_CODE; else echo \"{ \\\"status\\\": \\\"success\\\", \\\"image\\\": \\\"$IMAGE_TAG\\\" }\"; fi; docker image prune -a -f 1>/dev/null; done; docker logout '$ECR_REGISTRY';"

That is my attempt at getting this to echo successfully twistlocking an image, but for some reason it won't echo success, which means something must be going wrong...

Here is what the script would look like if it wasn't crammed into a single line:

#!/bin/bash

# Get ECR login token and login to ECR
aws --region us-gov-west-1 ecr get-login-password | \
  docker login --username AWS --password-stdin "$ECR_REGISTRY" 2>/dev/null

# Iterate over each image tag and perform Twistlock scan
for IMAGE_TAG in $IMAGES; do
  # Pull the image from ECR
  docker pull "$ECR_REGISTRY/$IMAGE_REPOSITORY:$IMAGE_TAG" 1>/dev/null

  # Scan the image using Twistlock
  twistcli images scan --project vanotify --address https://twistlock.devops.va.gov \
    --user vanotify-ci-user "$ECR_REGISTRY/$IMAGE_REPOSITORY:$IMAGE_TAG"

  # Check the exit status of the Twistlock command
  STATUS_CODE=$?
  if [ $STATUS_CODE -ne 0 ]; then
    # If the Twistlock command failed, exit the script with the same status code
    exit $STATUS_CODE
  else
    # If the Twistlock command succeeded, echo a success message in JSON format
    echo "{ \"status\": \"success\", \"image\": \"$IMAGE_TAG\" }"
  fi

  # Remove all unused Docker images from the host machine
done

# Logout from ECR
docker logout "$ECR_REGISTRY"
docker image prune -a -f 1>/dev/null
ldraney commented 1 year ago

I'm now going to go through piece by piece to see what is producign "success" and to see why it won't run my loop.

tabinda-syed commented 1 year ago

@k-macmillan Could you please see above and share your thoughts with Lucas?

@ldraney Please remember to talk with Kyle about 1) deciding which option to proceed with based on current findings and 2) a potential refactoring/tech debt ticket for future consideration.

k-macmillan commented 1 year ago

I gave Lucas some guidance on this in a huddle. He's going to use that to make changes.

ldraney commented 1 year ago

I think I finally found a maintainable solution: image.png

In essence, build.yml will now call Twistlock with gh cli command, and Twistlock is only callable by workflow_dispatch; thus, everytime dev_deploy is run, build.yml will call twistlock as a separate workflow, distinct from deployment. This is a solution on multiple fronts:

  1. If twistlock fails we can simply rerun the failed twistlock action
  2. All our deployed images will still be run under twistlock
  3. Easy to maintain