actions / checkout

Action for checking out a repo
https://github.com/features/actions
MIT License
5.75k stars 1.7k forks source link

Checking out a merge commit in `pull_request_target` workflows #518

Open mpdude opened 3 years ago

mpdude commented 3 years ago

I am coming from https://github.com/dependabot/dependabot-core/issues/3253, where there is a lot of confusion of how to safely run actions with secrets when untrusted code from external PRs comes into play.

The bottom line is that there may be situations where you – after you understood the risks – might want to use the pull_request_target event because it has access to secrets; but combine that with a checkout of the PR.

One suggested way of doing this is with

- name: Checkout
        uses: actions/checkout@v2
        with:
          ref: ${{ github.event.pull_request.head.sha }}

which will check out the PR head commit.

This is, however, not 100% what you'd get for a simple uses: actions/checkout@v2 on a pull_request event, because that would check out a merge commit.

I wonder whether it would be possible for this action here to also support checking out such a merge commit on pull_request_target events?

I don't know if creating such a merge commit involves advanced Git trickery to get-it-right™️ , so I thought this was the best place to ask.

mpdude commented 3 years ago

Here's another reason why it would be helpful if this could be supported directly in the action:

I'd like to use actions/checkout with persist-credentials: false to leak as little sensitive information as possible. But with this, I cannot easily run any git commands myself in subsequent steps to create a merge commit (for example, it seems I'd need an extra git fetch to get the PR head commit first).

If the merge commit could be created by the action, it could happen before the cleanup of persisted credentials.

trim21 commented 3 years ago
      - name: Checkout code
        uses: actions/checkout@v2
        with:
          ref: "refs/pull/${{ github.event.number }}/merge"

this should work as you expected I think.

updated 2023-08-02

When I previsouly wrote this reply, github doesn't support disabling or limiting GitHub Actions for outside collaborators, so it's not a problem to simply use refs/pull/${{ github.event.number }}/merge.

But if your workflow require approval for outside collaborators, you should use this one

      - name: Checkout code
        uses: actions/checkout@v3
        with:
          ref: "${{ github.event.pull_request.merge_commit_sha }}"
geekflyer commented 2 years ago
      - name: Checkout code
        uses: actions/checkout@v2
        with:
          ref: "refs/pull/${{ github.event.number }}/merge"

this should work as you expected I think.

I wonder if this is safe fully safe though? With this approach, if a malicious PR author pushes unsafe code between PR approval and the checkout action being run, your CI system might checkout a different version of the code than what the maintainer approved.

ref: ${{ github.event.pull_request.head.sha }} on the other hand is safe since it references a fixed head.sha that corresponds to the time the pull_request_target has triggered.

trim21 commented 2 years ago

I wonder if this is safe fully safe though? With this approach, if a malicious PR author pushes unsafe code between PR approval and the checkout action being run, your CI system might checkout a different version of the code than what the maintainer approved.

ref: ${{ github.event.pull_request.head.sha }} on the other hand is safe since it references a fixed head.sha that corresponds to the time the pull_request_target has triggered.

It's not safe.

That's why you shouldn't execute any test code or lint plugin from PR. They can read env and maybe able to read your repo secret.

you should only execute code like lint (with out loading plugin from repo) or formatter.

geekflyer commented 2 years ago

I wonder if this is safe fully safe though? With this approach, if a malicious PR author pushes unsafe code between PR approval and the checkout action being run, your CI system might checkout a different version of the code than what the maintainer approved. ref: ${{ github.event.pull_request.head.sha }} on the other hand is safe since it references a fixed head.sha that corresponds to the time the pull_request_target has triggered.

It's not safe.

That's why you shouldn't execute any test code or lint plugin from PR. They can read env and maybe able to read your repo secret.

you should only execute code like lint (with out loading plugin from repo) or formatter.

I mean ref: ${{ github.event.pull_request.head.sha }} is safe IF the maintainer has fully reviewed the the PR code and determined that the code is safe (let's say for a README change). However ref: "refs/pull/${{ github.event.number }}/merge" is not safe, because even if the original PR was just changing some README, a malicious actor can push more code changes in a specific time window that are then being tested against.

trim21 commented 2 years ago

Yes, but github.event.pull_request.head.sha is not what we want in this situation. We want to know that the repo will look like after the pr is merged.

geekflyer commented 2 years ago

Yes, but github.event.pull_request.head.sha is not what we want in this situation. We want to know that the repo will look like after the pr is merged.

yeah, I mean there is probably other ways to achieve the merge and test against the merged state. But using refs/pull/${{ github.event.number }}/merge" alone is not exactly the safest way since it references a ref that can be updated by the outside contributor.

dvdhrm commented 1 year ago

Workaround:

Checkout refs/pull/${{ github.event.number }}/merge and then verify HEAD^ equals github.event.pull_request.head.sha afterwards. If they don't match, cancel your workflow (a newer one is already scheduled, given that the head-sha changed).

Not ideal, but safe and works.

antoniovazquezblanco commented 1 year ago

Gist of the previous sugestion:

- name: Checkout
  uses: actions/checkout@v3
  with:
    ref: "refs/pull/${{ github.event.number }}/merge"

# Important security check: https://github.com/actions/checkout/issues/518
- name: Sanity check
  run: |
    [[ "$(git rev-parse 'HEAD^')" == "${{ github.event.pull_request.head.sha }}" ]]
fxmarty commented 1 year ago

@antoniovazquezblanco I believe the correct syntax should be:

- name: Checkout
  uses: actions/checkout@v3
  with:
    ref: "refs/pull/${{ github.event.number }}/merge"
    fetch-depth: 2

# Important security check: https://github.com/actions/checkout/issues/518
- name: Sanity check
  run: |
    [[ "$(git rev-parse 'HEAD^2')" == "${{ github.event.pull_request.head.sha }}" ]]

because the branch that is merged into is the first parent. Isn't it?

trim21 commented 1 year ago

Gist of the previous sugestion:

- name: Checkout
  uses: actions/checkout@v3
  with:
    ref: "refs/pull/${{ github.event.number }}/merge"

# Important security check: https://github.com/actions/checkout/issues/518
- name: Sanity check
  run: |
    [[ "$(git rev-parse 'HEAD^')" == "${{ github.event.pull_request.head.sha }}" ]]

this is not necessary, and what's more, useless. someone can still push commit after github actions finish this step. Are you plan to add this step before every workflow step?

Then just use concurrency group, don't invent it again

dvdhrm commented 1 year ago

[...]

this is not necessary, and what's more, useless. someone can still push commit after github actions finish this step.

This is not about preventing someone to push data to the branch, or somehow guarantee the data in the branch is safe. This is about verifying that the data that was checked out matches the data of the pull-request-target event. That is, if somehow the repo was modified after the event was triggered, but before the event handler ran, then we want to avoid using the data, because it does not necessarily match the data that was approved by a maintainer.

In other words: If a specific PR SHA is approved, then only that SHA should be operated on by the action. There are several options: check out just that SHA exactly, then create the merge yourself. Or check out the proposed-merge and verify that it has the same SHA as the event.

trim21 commented 1 year ago

[...]

this is not necessary, and what's more, useless. someone can still push commit after github actions finish this step.

This is not about preventing someone to push data to the branch, or somehow guarantee the data in the branch is safe. This is about verifying that the data that was checked out matches the data of the pull-request-target event. That is, if somehow the repo was modified after the event was triggered, but before the event handler ran, then we want to avoid using the data, because it does not necessarily match the data that was approved by a maintainer.

In other words: If a specific PR SHA is approved, then only that SHA should be operated on by the action. There are several options: check out just that SHA exactly, then create the merge yourself. Or check out the proposed-merge and verify that it has the same SHA as the event.

Then just use concurrency group, don't invent it again

dvdhrm commented 1 year ago

Then just use concurrency group, don't invent it again

  1. The concurrency group only triggers IF the change to the branch triggered an update to the event.
  2. The concurrency group does not trigger if the event was dropped by github for any reason (e.g., rate-limiting).
  3. The concurrency group might trigger too late, when the workflow already ran and used the outdated data. There is no guarantee that the concurrency group is triggered before data becomes available on the branch.
trim21 commented 1 year ago

Then just use concurrency group, don't invent it again

  1. The concurrency group only triggers IF the change to the branch triggered an update to the event.
  2. The concurrency group does not trigger if the event was dropped by github for any reason (e.g., rate-limiting).
  3. The concurrency group might trigger too late, when the workflow already ran and used the outdated data. There is no guarantee that the concurrency group is triggered before data becomes available on the branch.

The concurrency group doesn't change when a workflow is triggered.

It only delay the workflow run or cancal a unfinished previous job, base on your config.

In this case, you should add concurrency group config like this:

on:
  pull_request_target:

jobs:
  my-lint:
    concurrency:
      group: ${{ github.event.pull_request.number }}
      cancel-in-progress: true

The workflow will still run triggered like it before, but it will canal previous workflow immediately.

dvdhrm commented 1 year ago

Then just use concurrency group, don't invent it again

  1. The concurrency group only triggers IF the change to the branch triggered an update to the event.
  2. The concurrency group does not trigger if the event was dropped by github for any reason (e.g., rate-limiting).
  3. The concurrency group might trigger too late, when the workflow already ran and used the outdated data. There is no guarantee that the concurrency group is triggered before data becomes available on the branch.

The concurrency group doesn't change when a workflow is triggered.

It only delay the workflow run or cancal a unfinished previous job, base on your config.

  1. A PR on a branch is approved.
  2. A workflow event is triggered.
  3. The workflow starts running.
  4. Before the workflow checks out the repository:
    1. The branch is force-pushed with different data.
    2. Github updates the branch and will provide the new data to git requests.
    3. Github is about to handle workflow events for this push, but before it does the original workflow continues
  5. The workflow clones the git repo and will get the force-pushed update of the data, rather than the data that was approved.
  6. The workflow should cancel, because it does not deal with the data that was approved. This is were the mentioned github-actions check comes in.
  7. The workflow runs to completion.
  8. Github continues handling the force-push:
    1. A new workflow-event is triggered, canceling any ongoing workflows that have the same concurrency group. However, all such workflows have already finished.
    2. The new workflow will run with the new, un-approved data.

This is a race-condition that is not solved by using a concurrency-group. The github state-machine is NOT immediate. There is no guarantee that github workflows will trigger before branch updates are made available, and this is highly unlikely to be the case in distributed systems.

trim21 commented 1 year ago

This is a race-condition that is not solved by using a concurrency-group. The github state-machine is NOT immediate. There is no guarantee that github workflows will trigger before branch updates are made available, and this is highly unlikely to be the case in distributed systems.

OK, I see you point, I'm wrong about this.

Better option is to use ref: "${{ github.event.pull_request.merge_commit_sha }}", this is guaranteed to be (github generated) merged commit sha based on approved head commit sha

trim21 commented 1 year ago

@antoniovazquezblanco I believe the correct syntax should be:

- name: Checkout
  uses: actions/checkout@v3
  with:
    ref: "refs/pull/${{ github.event.number }}/merge"
    fetch-depth: 2

# Important security check: https://github.com/actions/checkout/issues/518
- name: Sanity check
  run: |
    [[ "$(git rev-parse 'HEAD^2')" == "${{ github.event.pull_request.head.sha }}" ]]

because the branch that is merged into is the first parent. Isn't it?

just use

      - name: Checkout code
        uses: actions/checkout@v3
        with:
          ref: "${{ github.event.pull_request.merge_commit_sha }}"
mering commented 1 year ago

What happens when this commit is not available (e.g. in case of a merge conflict)? The pull_request_target event will still be triggered (in contrast to the pull_request event), so do we need an additional if to check for that?

AlekSi commented 12 months ago

So ref: "${{ github.event.pull_request.merge_commit_sha }}" is the answer, and only the README.md (and possibly https://securitylab.github.com/research/github-actions-preventing-pwn-requests/) should be updated?

Edit: I see that github.event.pull_request.merge_commit_sha is null for pull_request_target event: https://github.com/FerretDB/FerretDB/actions/runs/6309162567/job/17128566077?pr=3433 This documentation page suggests that merge_commit_sha can be null while GitHub checks the mergeability of the PR in the background. Unfortunately, it is too late for GitHub Actions – the job is already started with github.event.pull_request.merge_commit_sha being null.

AlekSi commented 12 months ago

What happens when this commit is not available (e.g. in case of a merge conflict)? The pull_request_target event will still be triggered (in contrast to the pull_request event), so do we need an additional if to check for that?

https://docs.github.com/en/actions/using-workflows/events-that-trigger-workflows#pull_request says this:

Conversely, workflows with the pull_request_target event will run even if the pull request has a merge conflict.

faximan commented 11 months ago

Edit: I see that github.event.pull_request.merge_commit_sha is null for pull_request_target event: https://github.com/FerretDB/FerretDB/actions/runs/6309162567/job/17128566077?pr=3433 This documentation page suggests that merge_commit_sha can be null while GitHub checks the mergeability of the PR in the background. Unfortunately, it is too late for GitHub Actions – the job is already started with github.event.pull_request.merge_commit_sha being null.

I think I am observing this too. Using

ref: "${{ github.event.pull_request.merge_commit_sha }}"

seems to consistently check out a merge commit that is based on the second latest commit in the PR. When it is triggered on the first commit in the PR, it simply checks out main.

My best guess is a race and that merge_commit_sha returns the last computed merge commit until the mergeability of the current commit has been computed. Does this make sense?

nguyentvan7 commented 10 months ago

Edit: I see that github.event.pull_request.merge_commit_sha is null for pull_request_target event: https://github.com/FerretDB/FerretDB/actions/runs/6309162567/job/17128566077?pr=3433 This documentation page suggests that merge_commit_sha can be null while GitHub checks the mergeability of the PR in the background. Unfortunately, it is too late for GitHub Actions – the job is already started with github.event.pull_request.merge_commit_sha being null.

I think I am observing this too. Using

ref: "${{ github.event.pull_request.merge_commit_sha }}"

seems to consistently check out a merge commit that is based on the second latest commit in the PR. When it is triggered on the first commit in the PR, it simply checks out main.

My best guess is a race and that merge_commit_sha returns the last computed merge commit until the mergeability of the current commit has been computed. Does this make sense?

I am hitting this too. Is there anyway to delay the job until the mergeability request is finished? I'm not sure if a delay within the workflow on a separate job before referencing the SHA is enough

lujop commented 6 months ago

I'm having precisely the same problem, that when the PR is created and I checkout the github.event.pull_request.merge_commit_sha the base branch is retrieved. Then if I add another commit it works correctly.

fxmarty commented 6 months ago

cc @regisss

nguyentvan7 commented 6 months ago

@lujop @faximan and others who hit the same issue

I worked around this by having a pull_request event action that creates an artifact. Then have another workflow_run action that triggers whenever this pr action completes. It will download the artifact, then do all the processing in this action.

This doesn't help if you want the processing to show up on the PR, but it worked for us, as we just needed the secret access to trigger a fire-and-forget deployment action. And the part we really cared to check was the artifact creation to make sure it succeeds.

https://github.com/frzyc/genshin-optimizer/commit/af59815665a6e187d9ffbb6e325f09dd211f2edc

lujop commented 6 months ago

In our case I end up using github.event.pull_request.head.sha as we have the restriction to Require branches to be up to date before merging. And in the end the update will be done and the validation run again.

alexander-schranz commented 6 months ago

I'm still stuck on this one. Specially as my CI task runs not only for pull_request_target but als for push.branches. Somebody successfully using pull_request_target still?

infinisil commented 5 months ago

This is a bit messy, but I wrote an inline script to check whether the PR is mergeable, and it seems to work decently: https://github.com/NixOS/nixpkgs/blob/c8db8bd9656ee3d373ce9445063c25c47f442118/.github/workflows/check-by-name.yml#L31-L86