department-of-veterans-affairs / notification-api

Notification API
MIT License
16 stars 8 forks source link

BUG: Twistlock Authentication Failure #964

Closed mjones-oddball closed 1 year ago

mjones-oddball commented 1 year ago
Hello Twistlock tenants,

The DOTS team has upgraded twistlock to the latest version: 22.06 and have migrated all of your data.

As we stated before in previous coms, some data would not migrate and you were advised to export the data for your records. We have exported everyone’s old data just in case, and can assist you if you need it for archiving purposes. Alternately, the old environment is still active at this URL here: twistlock-dte.dots-ftl.com (this is a temp DNS that does not have a valid cert, so just ignore the cert warnings in chrome) so you can still do the export yourselves/ensure your new environment has all the expected data. We will be deleting the old environment on: 12/11/22

Finally, the new defender NLB address has also been updated and you will need to update your deployed defenders to use the new address.

New NLB address: DOTS-PROD-Defender-ELB-8be52ee5f6b5930e.elb.us-gov-west-1.amazonaws.com

After the above upgrade, we are locked out of our account. We cannot log in through LDAP, nor can our github actions execute the Twistlock scans on deployments to AWS, so we're locked out of testing in Dev.

k-macmillan commented 1 year ago

Reached out to Jared Piimauna. My Twistlock password needed reset. That allowed me to login but the github actions are still failing. Reached out to Jared again with more details including direct links to our failing actions and the code that runs them.

We seem to have all our credentials correct (all logins work), but we don't see authorized to run the scan.

k-macmillan commented 1 year ago

Jared got back to me with a password update for our github action. Updated in parameter store. Didn't fix it. Stopped and started the instance, didn't fix it. Reached out to Jared again and he responded with another suggestion to change --project VaNotify to --project vanotify. I will have to create a branch and then use that branch to test the action change.

k-macmillan commented 1 year ago

Note: Added new checklist item to revert the PR that was just linked.

kalbfled commented 1 year ago

Here's what I think I know at this point . . .

There are 3 VMs/containers/instances involved:

  1. The Github Actions job runner for twistlock.yml. This is Ubuntu.
  2. The Twistlock instance visible in the AWS EC2 console. This is not something we build. I think it is given to us by another team.
  3. The container in ECR that we want to deploy.

When we trigger a deployment action, the Notify API is packaged as a container and deployed to ECR. Subsequently, the job runner issues a twistcli command that triggers the Twistlock EC2 instance to scan the new build container in ECR. This works because the Twistlock instance has the credentials necessary for this access.

I'm told that the problem is that we need to upgrade the version of twistcli. It is not obvious to me how this CLI is getting installed on the job runner in the first place. I would not expect it to be part of the Ubuntu VM Github provides. If somebody can answer this question, we should be able to move forward.

jessecanderson commented 1 year ago

There is the Twistlock EC2 instance is where the twistcli should be installed and not anything in GitHub. The GitHub runner just pushes up the commands to the instance using the uses: ./.github/actions/run-commands-on-ec2 and then waits for the output from that instance. If you need to upgrade the twistcli then it should be done on that EC2 instance in AWS. I would assume that you can console to it the same way you could console to the other machines. But I don't know for sure.

cris-oddball commented 1 year ago

Based on Jesse's comment, is the EC2 configured to use the correct IAM ? We just had this problem with Locust and had to change the Locust file to use the correct IAM. Right now, the twistlock instance fails 1/2 checks and the console throws a permissions error when trying to connect to it.

kalbfled commented 1 year ago

There is the Twistlock EC2 instance is where the twistcli should be installed and not anything in GitHub. The GitHub runner just pushes up the commands to the instance using the uses: ./.github/actions/run-commands-on-ec2 and then waits for the output from that instance. If you need to upgrade the twistcli then it should be done on that EC2 instance in AWS. I would assume that you can console to it the same way you could console to the other machines. But I don't know for sure.

@jessecanderson Thank you for that information. The "-on-ec2" part of that went over my head until then.

cris-oddball commented 1 year ago

@kalbfled @k-macmillan a few other data points:

The best way to update twistcli is as follows: @ldraney

Once updated, there are still changes needed:

The output from the failed action indicates an authorization error.

----------ERROR-------
Status: 401 Unauthorized
GET https://twistlock.devops.va.gov/api/v1/version?project=VaNotify failed.
cris-oddball commented 1 year ago

@kalbfled @k-macmillan @ldraney alternate method to get the twistlock cli on the server without using an S3 bucket:

See docs here for the download client method: https://prisma.pan.dev/api/cloud/cwpp/22-06/util#operation/get-util-arm64-twistcli

In case the above call requires authentication https://prisma.pan.dev/api/cloud/cwpp/22-06/authenticate#operation/post-authenticate

k-macmillan commented 1 year ago

A call with Jared or someone else on his team would definitely be beneficial at this point.

cris-oddball commented 1 year ago

from dsva va-notify-team channel from Filip Fafara last December, this upgrade didn't seem to happen until late 2022.

Got notification from DOTS team that they are going to upgrade twistlock during off-hours on Dec 11th. We will need to upgrade our twist-cli. I think the best way to approach this is to just rebuild our twistlock image. It should download correct version of the CLI from twistlock console as part of cloud-init.

Edit to remove question about the ECR image since that image is of our app and not relevant.

cris-oddball commented 1 year ago

Found it! That URL is set in this twistlock-init.cfg

and twistlock-init.cfg is invoked from twistlock.tf

runcmd:
  - systemctl enable amazon-cloudwatch-agent
  - systemctl start amazon-cloudwatch-agent
  - rpm --rebuilddb
  - yum -t -y install docker
  - usermod -a -G docker ssm-user
  - systemctl enable docker
  - systemctl start docker
  - [ sh, -c, "sed -i 's/put_password_here/'$(aws --region us-gov-west-1 ssm get-parameter --name /utility/twistlock/vanotify-ci-user-password --with-decryption | jq '.Parameter.Value' -r)'/g' /root/.netrc" ]
  - [ curl, -k, -L, -n, -o, /usr/bin/twistcli, "https://twistlock.devops.va.gov/api/v1/util/twistcli?project=VaNotify"]
  - chmod +x /usr/bin/twistcli
cris-oddball commented 1 year ago

This issue is tested and resolved with these two PRs that are awaiting approval, merge and deploy.

Infra PR

Also ignored all found vulnerabilties until jan 15. Will need tickets to handle those.