Security considerations

plocket commented 2 years ago

Related to our architecture discussions in #128

This Miro board, in part, maps out some of this and discusses options: https://miro.com/app/board/uXjVOd8LBOc=/?invite_link_id=786535072712. This issue is for a more security-focused and detailed discussion.

Using the da API will help with some of these, but not all.

One of the main conflicts for us is complexity for users vs. security for users. The more we control, the less work users have to do to keep up to date with our updates. The more we control, though, the less assurance there is against malicious code. It's important to keep in mind, though, that many of our intended users wouldn't understand the code they have control of anyway.

da server security

Can we ensure that the da testing account does NOT have admin permissions? As per the docs, if it only has developer permissions, it shouldn't be able to look at the server's config.yml file, etc. We currently assume this right now, but don't test for it. That'll be easier once we can use the API.

Are there other things that a developer can do through the API that we need to be concerned about?

GitHub account/commit manipulation

Making commits from docassemble

If the da testing account isn't integrated with github, it can't push new things to the github repo of the package. Unfortunately, some folks have private packages and will therefore need to use the SSH method, which does require integration with a github account.

We're still not sure how to handle this in a way that can reassure our users. Once we move to the API, it'll be more clear. They will no longer give us access to a username and password and we won't be able to do absolutely anything that a user can do. I've been reassured that the API can't make pushes to github and there are no plans to allow that in the future. At the moment, though, we can't currently provide anything security against malicious code.

Any push that was made from da would at least have a commit in github history, but that's not a great reassurance. Who checks their commit history that regularly or that closely?

Making commits from the GitHub workflow or composite action

Even now we wouldn't be able to change their package's code without making some kind of commit, which people would be able to see. Again, that's not a huge help, but good to keep in mind.
Do those kinds of processes need a GitHub token, or other permissions, even when running in a workflow on a user's own repository? How about when in a composite action? I think so, but we need to make sure of that. If that security measure is in place, the developer could limit our influence. See https://docs.github.com/en/actions/security-guides/security-hardening-for-github-actions#restricting-permissions-for-tokens.
The user's workflow currently contains all of the processes and they can check that it's not changing any of those things (though we're planning to change that). That only works if the user knows enough about workflows, etc., to be able to tell what's going on, and that puts too much burden on the user.
We're planning to move all the action stuff into a composite action and the user will have less visibility into those. Again, we'll need to make sure a GitHub PAT, or some permission, is necessary and, thus, users will be safe from us.

How do we ensure that our setup interview can't create a secret with a GitHub token? See the section on the setup interview.

Alternatively, a solution might be setting up workflow permissions explicitly in the developer's repo's workflow. I can't find that documentation right now, but it lets you specify what a workflow. We don't need permissions that allow pushing or anything like that, so we could limit things quite effectively.

All that said, we can offer folks the ability to have their tests on a totally separate repo than their interviews. That would prevent our code from manipulating their interviews' code after the setup process. We already use an environment variable for the repo address in our setup execution code, we would just have to make sure security-conscious people can put that variable explicitly in their workflows. We'd also need to make sure to use that in our composite action, along with a fallback for people who continue to use the current method. See the section about tests being edited, though.

Manipulation during the setup interview

How do we ensure that our setup interview can't create a secret with a GitHub token? The user would have to give us that token during the course of the setup interview. They do give us one PAT during that interview, but we instruct them to delete it right afterwards. I'm not sure what we can do to make sure they delete the PAT afterwards.

I'm not sure how reassuring we can be here beyond that. We require repo, workflow, and, at times admin level permissions, right? Let's check what we can do with those. I think we don't ask for things that can manipulate account details, so that might be useful to know.

As far as pushes go, I'm not sure how we can make sure that we can only make a PR and cannot actually push directly to the repository.

Once we find out all the kinds of permissions the setup interview gets and the associated abilities, we can tell people exactly what those permissions can manipulate in a section about security. If the scope of those permissions is limited enough, we can tell them all the things to look at to make sure that nothing was changed without their consent. We can link them to GitHub documentation that outlines the permissions they've given us. They can do things like check on the commits that have been made to the repo after this to make sure our code hasn't done anything unexpected. If there are too many things to check, though, it's a bit too much of a burden to put on the user. If it comes to that, we'll have to re-think how we handle this problem.

Other github changes

I don't believe there are other things we can change in a person's github account, right? Managing contributors, changing secrets, etc? I think the workflow would have to specifically pass a token to the composite action, but we should make sure of that.

Manipulation when interacting with interviews

These are really tricky ones and I'm not sure how much we can really do about these. Ideas quite welcome.

[Regarding the below: With a little more consideration, this is a vulnerability of any organization on the internet. A bad actor could target an organization and write malicious puppeteer code even without the help of our library. Maybe this isn't an ALKiln problem, but a broader problem with the internet in general. That's not great, but not something we can fix.]

Puppeteer manipulation

[Regarding the below: With a little more consideration, this is a vulnerability of any organization on the internet. A bad actor could target an organization and write malicious puppeteer code even without the help of our library. Maybe this isn't an ALKiln problem, but a broader problem with the internet in general. That's not great, but not something we can fix.]

If our puppeteer code is manipulated and the manipulator knew what they were looking for, they could change the javascript to do whatever javascript can do on a website. Press buttons, but also send AJAX requests, etc.

I'm not at all sure what we can do about that. Can we do more than putting practices into place that will help us detect such a manipulation? From our users' perspective, I'm even less sure what we could do.

Editing tests

[Regarding the below: With a little more consideration, this is a vulnerability of any organization on the internet. A bad actor could target an organization and write malicious puppeteer code even without the help of our library. Maybe this isn't an ALKiln problem, but a broader problem with the internet in general. That's not great, but not something we can fix.]

Even if a user's tests are isolated in a test-only repository, that still leaves the possibility that malicious code could edit people's tests. The tests interact with the interviews.

We know that test files can, at the very least, be created from scratch. At least, a person can definitely do that in their own workflow. That's the way our language/translation tests are created on the fly. We still don't know about composite actions, but I suspect that's the case there too.

If there are things that should not be done on the interviews, like submitting to courts, it's a potentially serious exploit.

One solution is for users to always test on development servers. If they use environment variables there, they can ensure that interviews on development servers don't take actions that are insecure. A naive developer wouldn't know how to do this, though, and it would be hard to help them understand how to do that. Not only that, testing production code is something that some people are specifically looking for.

plocket commented 2 years ago

Also see https://docs.github.com/en/actions/security-guides/security-hardening-for-github-actions#using-third-party-actions for tips. Are these practical to include with our documentation?

plocket commented 2 years ago

Puppeteer manipulation and editing tests

With a little more consideration, this is a vulnerability of any organization on the internet. A bad actor could target an organization and write malicious puppeteer code even without the help of our library. Maybe this isn't an ALKiln problem, but a broader problem with the internet in general. That's not great, but not something we can fix.

[Added this to the original post where it might be relevant]

michaelhofrichter commented 2 years ago

So my real concern is how do I know that the remote code is going to install and test my package instead of installing something malicious to take a screen shot of. For example, while you need to be an admin to modify the config file, you don't need to be an admin to read it. Just run:

question: | 
  Leak my data
subquestion: | 
  bucket: ${ get_config("s3").get('bucket') }

  access key: ${ get_config("s3").get('access key id') }

  region: ${ get_config("s3").get('region') }

  secret access key: ${ get_config("s3").get('secret access key') }
mandatory: True

and that should display all of your configuration details to then screenshot or email elsewhere. If the user has the ability to pull into the playground, it also has the ability to write something new in the playground.

As an end user (and I recognize I'm probably more sophisticated than most), I'd much prefer code that I have control over that is leveraging my credentials. If that means, I have to install ALKiln on my server to be able to run it, that's fine with me. Maybe have that as one of the questions in the setup interview. If I have to configure puppet or something else to be able to reproduce these checks, again, I'm happy to do that if you've got a set of instructions on how to set it up. I don't want my actions dependent on code that isn't something that I can evaluate first. If It is automatically being updated without my ability to review the most recent commits (or be notified of new commits other than following your repo in Github), it seems like a huge security vulnerability.

I think the current setup would present security risks that would not be allowed if we were doing more than experimenting with this model. Again, happy to discuss this further.

plocket commented 2 years ago

To restate the concern, our package could use the API to pull in a malicious docassemble package that could have an interview that prints config data to the screen. The malicious code could add a test that goes to that interview and sees that information. [That's the specific scenario described, though there could be other attacks of a similar nature that we need to be aware of.]

We'll have to discuss that with the other devs and see strategies people come up with. We don't just want it solved for people who can install the package and maintain it themselves. We also want it solved for people who don't have the technical skills to do that.

plocket commented 2 years ago

Conclusions/thoughts from our discussion:

Controlling/monitoring the Kiln code that's being used

Having the code on one's server or in one's repo is possible, but there are easier ways to have control over what code is being run. In the user's package.json file, under "dependencies", there's a line requiring the version of kiln (called docassemble-cucumber in there for now):

"docassemble-cucumber": "^3.0.0",

That's a very broad request, pulling in kiln even after it's changed its code. If a developer makes it more specific, they can ensure they freeze the version of kiln, getting only the code that they trust:

"docassemble-cucumber": "3.0.4",

When they want, they can compare a new version of kiln to an old one and see what's changed, then update when they feel comfortable with it.

More general thoughts

There are some things we can do, but there's also always inherent risk in using third-party anything. In our case, we have access to a user account, and that's not always the case for every third-party tool, but every system has vulnerabilities. We discussed some of the measures we can work on to improve what we can:

People should definitely be made aware that they are exposing themselves to risk when using this.
We should strongly recommend to people that they
- Use a dedicated server for these tests
- Periodically clear out the test server
- Never share API keys between their production server and other servers.
The setup interview should give people the option to have complete fine-grained control over the version of Kiln they're using - setting the package.json so that it controls the version down to the patch version, as described above. Also, the option to enter a custom version. ~[We might discuss this more - npm doesn't ensure that the code of old published versions stays the same and it can be changed. See additional thoughts later about having people store the code they want themselves.]~ (see later disproof of this)
On our part, we can ensure that no one can push to our main branch without a PR and a review. We might be able to whitelist some people based on their repo permissions so that, for example, we can bump versions after a PR without a review.
We should add documentation about how to
- turn off a github workflow for a repo: https://docs.github.com/en/actions/managing-workflow-runs/disabling-and-enabling-a-workflow
- how to turn off github actions for a repo: https://docs.github.com/en/repositories/managing-your-repositorys-settings-and-features/enabling-features-for-your-repository/managing-github-actions-settings-for-a-repository#managing-github-actions-permissions-for-your-repository
- how to turn off github actions for an organization: https://docs.github.com/en/organizations/managing-organization-settings/disabling-or-limiting-github-actions-for-your-organization#managing-github-actions-permissions-for-your-organization
- See #434

Note: This doesn't cover everything outlined in this issue so far - not everything got discussed.

Additional thoughts:

We should strongly recommend to people that they
- Never give the tests access to an admin account, only a developer account.
- Make sure none of the GitHub secrets they send to us are GitHub Personal Access Tokens. Is this necessary? Will it just be confusing?
~We might split this repo into two repos, putting the API interaction on a separate repo.~ All parts of the process need to use the API to wait properly if the server's restarting.
We might write instructions on how people can fork our repo and use the code themselves so they don't have to rely on ours (once we figure out the process ourselves). Maybe look into git submodules: https://git-scm.com/book/en/v2/Git-Tools-Submodules

plocket commented 2 years ago

[NPM apparently keeps this stuff fairly under control]

~Warning: From what I've just heard, versions are not actually guaranteed to stay the same code. Even npm doesn't ensure that the code of published versions stays the same.~

~[Is PyPi the same?][Some people say yes]~

~[There's a project working on this problem right now: https://github.com/JarvusInnovations/hologit. It's trying to make sure that the version pulled and all its content is captured in each build and you'd be alerted if the content changed between builds, even if the version is the same.]~

Edit: We opened #437 to explore how close we can get to giving orgs full control of their code, even outside of npm. A current recommendation we can guide people through, though, is how to publish their own version of this repo on npm.

~Also, the API key will only be used in setup and takedown. We may add another repo (I know) to the current ones that just stores those. It would make it easier for people to check for changes to the code that uses the API key.~

~That might affect how we write our action - I'm not sure how to split the composite action into three parts like that without actually turning it into 3 repos.~

[We'll need the API key during the actual test runs too - to make sure the server isn't in the middle of restarting when we load the next page - in which case, there's no point in splitting repos.]

Edit: Tools like https://jfrog.com/artifactory/ sanitizes things like npm packages when they come in. I'm not sure if it also just confirms that published versions haven't been changed.

plocket commented 2 years ago

Action permissions

Here's a section on permissions for GitHub workflows: https://docs.github.com/en/actions/learn-github-actions/workflow-syntax-for-github-actions#permissions. That's what our action will be running in. We don't even use GITHUB_TOKEN. We dont' use any other personal access tokens either, though I'm not sure how we can guarantee that. We can only write instructions for people to make sure none of their secrets are PATs.

BryceStevenWilley commented 2 years ago

Some more thoughts and discussions from our deep-dive conversation:

As Michael mentioned in https://github.com/SuffolkLITLab/ALKiln/issues/425#issuecomment-992103065, With Docassemble, once you allow any package to be installed on your server, playground or not, any code / secrets can be seen from that package. This is why there isn't a huge difference between giving a user developer privileges and giving them admin privileges. A skilled malicious user could turn a developer account into an admin account.

Therefore, the only "truly" secure way to have ALKlin have access to a DA server is to follow the advice in https://github.com/SuffolkLITLab/ALKiln/issues/425#issuecomment-992732983 and:

Use a dedicated server for these tests. This should be completely separate from any production server
Never share API keys between their production server and other servers
Periodically clear out the test server / start a new docker container from scratch
- occasionally check this server to make sure it's not running resource stealing code (blockchain miners, etc.)

Since https://github.com/SuffolkLITLab/ALKiln/issues/425#issuecomment-994013593 though, we have found a decent amount of things that can be done to be sure that the versions of ALKlin you've vetted stay the same, and no hidden changes are pulled into your testing workflows.

There are a few places that you are able to pin versions of packages: with NPM packages, only a patch version is published for a package, no other package with that package name and version is ever going to be published (https://docs.npmjs.com/policies/unpublish). It can be 'unpublished' or taken away, but it won't ever be replaced with malicious code.

This also applies to github actions: if you use a third-party action (i.e. a workflow from this ALKlin repository) in your own workflow in a git repo, you can pin it to the exact SHA commit of the action that you want to use, and another commit with that exact SHA commit won't be able to be published to the repo. Again, someone could force-push and remove the SHA commit that you are pinned to, but they can't replace the code at that SHA commit with something malicious. (https://docs.github.com/en/actions/security-guides/security-hardening-for-github-actions#using-third-party-actions)

plocket commented 2 years ago

There was also something in the discussion that I didn't catch about general security practices, like checking for bots somehow.

BryceStevenWilley commented 2 years ago

The part about bots was:

occasionally check this server to make sure it's not running resource stealing code (blockchain miners, etc.)

There's nothing really specific to do there, just making sure that the VM you are running on (AWS, GCP, etc.) isn't using a large amount of resources, as most DA containers should close to idle most of the time.

plocket commented 2 years ago

To document:

[x] Disabling tests
[x] Warn this is inherently not secure.
[x] Use a separate server (dedicated server) for the test completely separate from production server so that no sensitive information can be revealed to potential malicious actors.
[x] Never share API keys or passwords between servers.
[x] Periodically clear out the test server / start a new docker container from scratch.
[x] (Is this a separate item, or just if the server admin doesn't want to restart the server from scratch?) occasionally check this server to make sure it's not running resource stealing code (blockchain miners, etc.)
[x] Use an ALKiln commit sha in the workflow file to pin the version of ALKiln being used, then review ALKiln code changes before moving to the next commit sha. (no need to pin to npm because in future everything is being run through the workflow) (https://docs.github.com/en/actions/security-guides/security-hardening-for-github-actions#using-third-party-actions)
[x] ~Make sure none of the GitHub secrets they send to us are GitHub Personal Access Tokens. Is this necessary/useful? Will it just be confusing? [PAT docs: https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/creating-a-personal-access-token]~ I've tried writing the explanation and it's too confusing and alarming. This isn't something we ask for, this isn't something we use, and this shouldn't come up at all, so I'm rejecting this one. Let me know if you want to discuss further.
[x] ~Tools like https://jfrog.com/artifactory/ sanitizes things like npm packages when they come in. I'm not sure if it also just confirms that published versions haven't been changed.~ Think this checks for malicious software/code, not just for changes. Seems a bit different and not sure it would be useful here.

For us:

[x] Restrict default and release branches to only accept PRs, not direct pushes, and require reviews for PRs.
[ ] Freeze the versions of our dependencies in our package.json and in the package.json we create for the developer.

plocket commented 2 years ago

Discussions carry on in #518: Copying a package-lock.json into the user's test run and being strict about enforcing it (npm flags).

plocket commented 2 years ago

Discussions will carry on in #518 or other discussion topics. The steps this issue settled on were taken care of by https://github.com/SuffolkLITLab/docassemble-AssemblyLine-documentation/pull/226.

SuffolkLITLab / ALKiln