actions / cache

Cache dependencies and build outputs in GitHub Actions
MIT License
4.46k stars 1.19k forks source link

Add details on authentication / permissions used #584

Closed briansmith closed 2 years ago

briansmith commented 3 years ago

See https://github.com/actions/upload-artifact/issues/197. I have an analogous question: I can create cache entries but I've (tried to) set the GitHub Token to read-only permissions. So, I'm puzzled as to why my jobs even succeed at writing to the cache. Is the cache action using an undocumented mechanism for authentication as https://github.com/actions/upload-artifact/issues/197 claims upload-artifact is? How can we control which jobs are allowed to read from and especially write to the cache?

briansmith commented 3 years ago

/cc @letmaik

dhadka commented 3 years ago

@briansmith Great question.

Is the cache action using an undocumented mechanism for authentication as actions/upload-artifact#197 claims upload-artifact is?

Yes, it's using the same mechanism as the artifact actions (see here).

I can create cache entries but I've (tried to) set the GitHub Token to read-only permissions. So, I'm puzzled as to why my jobs even succeed at writing to the cache

I'm not an expert in all of the details, but from what I understand, GITHUB_TOKEN is essentially used to authenticate with the GitHub API. So you can restrict Actions to have read-only permissions to your repos, issues, or other GitHub resources / products. But in this case, Artifacts and Cache are part of Actions, so those permissions don't apply.

There isn't a way to control which jobs are allowed to read/write to the cache, except by controlling where the cache is used in the workflow. This also means its important to be aware of when / how your workflows are run. Two good resources on this topic are:

  1. https://docs.github.com/en/actions/learn-github-actions/security-hardening-for-github-actions
  2. https://securitylab.github.com/research/github-actions-preventing-pwn-requests/

The cache has an additional safeguard called "scopes". You can think of the scope as the git ref, such as a branch, tag, PR merge, etc. We then grant "read+write" or "read-only" permissions to each scope. For example, a workflow triggered on a branch will have "read+write" permissions to that branch scope and "read-only" permission to the default branch (e.g., main). This prevents a malicious user from being able to inject cache content on one branch that is subsequently used by a different, more critical branch (e.g., a release branch) without first merging those changes. For reference, the official docs for this are:

  1. https://docs.github.com/en/actions/guides/caching-dependencies-to-speed-up-workflows#restrictions-for-accessing-a-cache.

Anyway, that's about my breadth of knowledge on this topic. I'm happy to put you in touch with someone on the Actions security team if you have any additional security concerns.

briansmith commented 3 years ago

Thanks for the great response.

My goal is to be able to run an untrusted action with no permissions (or maybe just contents: read for a public repo where it doesn't matter) so that even if the action is compromised, it cannot do any harm. I want this because it is too difficult for me to confidently review the code of an action to ensure it can't be compromised. The difficulty I see with the current mechanism is that, since there's no way to prevent an action from writing to the cache (or artifacts), there's no way to safely run an untrusted action. I could just avoid using the cache (and artifacts) but even if I avoid them directly, it is hard to be sure that the actions I depend on are not using the cache or artifacts in any way. So, I think it would be a very good idea for access to this token to be controllable using the permissions: mechanism that controls the capabilities of the GitHub token, so I can reset the default to "no permission" at the top level and then enable permission only for the jobs that I want to use the cache/artifacts.

letmaik commented 3 years ago

I believe currently caches are immutable (though I haven't found anything in the docs confirming that) which means malicious actions cannot really do any harm as long as (a) the cache is populated before those actions run, and (b) restore_keys: is not used. Having said that, there is discussion to allow updating caches, which would open up a hole.

I want to get to a world where there is proper job isolation such that I can run third-party unaudited actions like code scanners without having a way to influence the rest of the workflow (e.g. artifacts) or persistent state (cache).

rcowsill commented 3 years ago

I believe currently caches are immutable (though I haven't found anything in the docs confirming that) which means malicious actions cannot really do any harm as long as (a) the cache is populated before those actions run, and (b) restore_keys: is not used.

Can (a) be relied upon? Saved caches are erased after 7 days of inactivity or when the repo's cache quota is exhausted.

letmaik commented 3 years ago

I believe currently caches are immutable (though I haven't found anything in the docs confirming that) which means malicious actions cannot really do any harm as long as (a) the cache is populated before those actions run, and (b) restore_keys: is not used.

Can (a) be relied upon? Saved caches are erased after 7 days of inactivity or when the repo's cache quota is exhausted.

It gets subtle, I think the more proper solution is if certain jobs could be marked as read-only for cache. This wouldn't work if you do want to use the cache for "untrusted" jobs, in which case a solution may be to introduce cache islands together with permissions per island per job. Then I could say, trusted job A is allowed read-write to island "build" and untrusted job B can only read from "build" but read-write to "untrusted".

dhadka commented 3 years ago

I believe currently caches are immutable (though I haven't found anything in the docs confirming that) which means malicious actions cannot really do any harm as long as (a) the cache is populated before those actions run, and (b) restore_keys: is not used.

Caches are immutable, but with an asterisk that caches can be evicted and then overwritten. So if security is of the upmost importance, I wouldn't rely on immutability. This also wouldn't help if a malicious action could create the cache before the legitimate one.

dhadka commented 3 years ago

I pinged our security team to also get :eyes: on this.

briansmith commented 3 years ago

malicious actions cannot really do any harm as long as (a) the cache is populated before those actions run

It would be hard to rely on that since jobs run in parallel. The job that is intended to populate the cache entry isn't guaranteed to be run before any other job.

briansmith commented 3 years ago

This is closely related to #500 that @pllim filed. He notes:

GitHub Actions offers both pull_request and pull_request_target. In the latter's documentation, there is a very exciting warning about "cache poisoning."

The warning says:

Warning: The pull_request_target event is granted a read/write repository token and can access secrets, even when it is triggered from a fork. Although the workflow runs in the context of the base of the pull request, you should make sure that you do not check out, build, or run untrusted code from the pull request with this event. Additionally, any caches share the same scope as the base branch, and to help prevent cache poisoning, you should not save the cache if there is a possibility that the cache contents were altered. For more information, see "Keeping your GitHub Actions and workflows secure: Preventing pwn requests" on the GitHub Security Lab website.

github-actions[bot] commented 2 years ago

This issue is stale because it has been open for 200 days with no activity. Leave a comment to avoid closing this issue in 5 days.

github-actions[bot] commented 2 years ago

This issue was closed because it has been inactive for 5 days since being marked as stale.