actions / checkout

Action for checking out a repo
https://github.com/features/actions
MIT License
5.84k stars 1.72k forks source link

on pull request, checkout@v2 built some ref other than the one that was claimed #237

Closed jepler closed 4 years ago

jepler commented 4 years ago

Recently, I had a PR https://github.com/adafruit/circuitpython/pull/2831 which checked out green, so a fellow developer merged it. It broke the build.

In subsequent investigation, I noticed that while the linked Actions run https://github.com/adafruit/circuitpython/runs/633447824?check_suite_focus=true shows that it was building the same ref as the one that was merged, down in the logs you find something different.

Under the step "Run actions/checkout@v2", "Checking out the ref",

HEAD is now at e7eef6f0c Merge 6dfe9ab606352b36f8d295b250ac7118c459618b into 2d7cf4b792afa1b6755b9575b84cae4b65ae7ab9

6dfe9ab represents another revision of the code that was apparently replaced by force-push with the later value 75be728:

@jepler force-pushed the jepler:rgbmatrix-stm branch from 6dfe9ab to e5be728 7 days ago

ericsciple commented 4 years ago

Thanks! I'll loop in some folks.

Indeed something looks wrong.

From the mpy-cross-mac check run, the step "Dump GitHub context" reveals some key info:

{
  [...]
  "ref": "refs/pull/2831/merge",
  "sha": "e7eef6f0ca41e82a3082df79b7abbd6a369433a6",
  "repository": "adafruit/circuitpython",
  "event_name": "pull_request",
  [...]
  "event": {
    "action": "synchronize",
    "after": "e5be728c1c8a3a455b7095c9dab9e758c5016191",
    "before": "6dfe9ab606352b36f8d295b250ac7118c459618b",
    "number": 2831,
    [...]

Yet appears to contradict the git log -1 output at the end of the step "Run actions/checkout@v2":

/usr/local/bin/git log -1
commit e7eef6f0ca41e82a3082df79b7abbd6a369433a6
Merge: 2d7cf4b79 6dfe9ab60
Author: Jeff Epler <jepler@gmail.com>
Date:   Wed Apr 29 19:23:50 2020 +0000

    Merge 6dfe9ab606352b36f8d295b250ac7118c459618b into 2d7cf4b792afa1b6755b9575b84cae4b65ae7ab9

It looks like ${{ github.sha }} is the old merge commit.

[edit]: The commit message should read: Merge e5be728c1c8a3a455b7095c9dab9e758c5016191 into 2d7cf4b792afa1b6755b9575b84cae4b65ae7ab9 (since e5be7 is the "after" sha)

ericsciple commented 4 years ago

@andymckay have you heard of any similar occurrences?

The workflow payload contained the wrong merge commit.

iheanyi commented 4 years ago

👋🏾 Will be taking a look into this today and tomorrow.

ericsciple commented 4 years ago

Thanks @iheanyi for digging into this! We dug into this a bit and are unable to reproduce. This may be due to replication lag (received previous merge commit). Going forward we'll add more telemetry to the checkout step. So if this happens again we will be able to detect from telemetry.

Bankst commented 4 years ago

Hi, I just ran in to this same issue here: https://github.com/PhotonVision/photonvision/pull/18/commits/f2305bb501ec200d022a1bac4bac8ee316f68e54# The SHA for the commit is f2305bb501ec200d022a1bac4bac8ee316f68e54 but the workflow ran with commit 2e69b3e24b82faefc3702fc946834c61b31d8c4

https://github.com/PhotonVision/photonvision/pull/18/checks?check_run_id=844516246 Hopefully this can somehow help, having another known occurence

kenorb commented 4 years ago

Same happens when using with ad-m/github-push-action, related:

kenorb commented 4 years ago

My configuration is like:

  Push:
    if: github.event_name == 'pull_request'
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v2
        with:
          persist-credentials: false
          # Otherwise, you will failed to push refs to dest repo.
          fetch-depth: 0
          ref: refs/heads/${{ github.head_ref }} # Or: ${{ github.event.pull_request.head.ref }}
      - uses: actions/download-artifact@v2
         name: MyArtifact
      - name: Commit files
        run: |
          git config --local core.autocrlf false
          git config --local user.email $git_email
          git config --local user.name "${{ github.actor }}"
          git add . && git add --renormalize .
          git pull origin "${{ github.ref }}" --autostash --rebase -Xours
          git commit -am "${{ github.workflow }}"
        env:
          git_email: "${{ github.actor }}@users.noreply.github.com"
      - name: Push changes (not PR)
        if: github.event_name != 'pull_request'
        uses: ad-m/github-push-action@master
        with:
          branch: ${{ github.ref }}
          github_token: ${{ secrets.GITHUB_TOKEN }}
      - name: Push changes (PR)
        if: github.event_name == 'pull_request'
        uses: ad-m/github-push-action@master
        with:
          branch: refs/heads/${{ github.head_ref }}
          github_token: ${{ secrets.GITHUB_TOKEN }}

Despite I'm checking out the right HEAD, I see some extra merge commits:

image

kenorb commented 4 years ago

Here are more details based on the build 866142019.

I'm pushing my changes using ad-m/github-push-action@master as per:

 Push changes (PR) >
    directory: .
Run ad-m/github-push-action@master
  with:
    branch: refs/heads/v1.079-dev
    github_token: ***
    directory: .
Started: bash /home/runner/work/_actions/ad-m/github-push-action/master/start.sh
Push to branch refs/heads/v1.079-dev
To https://github.com/EA31337/EA31337.git
   7173d15..7efcedf  HEAD -> v1.079-dev

So my HEAD changed from 7173d15 to 7efcedf (which is clearly visible above), however the checkout action (I assume) have added extra merge commit BETWEEN them, so my list of 3 last commits (instead of 2) looks like:

  1. "My triggering commit" SHA:7173d15
  2. "Merge 7173d15 into e24c723" SHA:8b7e516 (unwanted merge commit appearing out of blue)
  3. "Pushed action's commit" SHA:7efcedf

Where e24c723 is origin/master merge commit, whereas I've requested to checkout PR HEAD (ref: refs/heads/${{ github.head_ref }}).

marian-vignau commented 4 years ago

It happens to me, too.

It runs the action on the project's wrong commit

Go to https://github.com/PyAr/CDPedia/runs/880106832

An you will see that the 5d8abdc commit number isn't in the log:

/usr/bin/git log -1 commit 5e52271eb4cfdbe501fb96f715efaf10b88d176c Author: marian-vignau mavignau@gmail.com Date: Fri Jul 17 00:38:32 2020 +0000

Merge 2b611cc85e9500c0d94f1e01eaf0bcaf057d7119 into 7d813b430b1b41968577387df75c7ba98814ea65

And it happens again in https://github.com/PyAr/CDPedia/runs/871872757 The logs doesn't mention the commit number ed80113. In the log

/usr/bin/git log -1 commit a4f842e8cd3b925513004b2adf64b626c3faa6cc Author: marian-vignau mavignau@gmail.com Date: Wed Jul 15 00:56:28 2020 +0000

Merge 6ae6c1de90ef2bea523713ab8ac494a4354bbbe1 into f4899180fc5a4b2308cf5e1e092e3f58955db519

Thank you!

jnewland commented 4 years ago

I've experienced this twice this week, once just a few minutes ago. I pushed a commit to a pull request that edited the workflow that runs on pull_request.synchronize and observed the Action run for that SHA fail due to code that was removed in the commit.

jnewland commented 4 years ago

After taking a closer look at the logs from actions/checkout@v2 on the most recent offending run, I see that the merge commit referenced refers to the SHA one step back in my (entirely linear) commit history. This commit that had been pushed 3 minutes prior to the commit I expected to be checked out. This seems to lend credence to the replication lag theory mentioned above. I'd be happy to share additional details privately if that'd help in any way.

This is a pretty gnarly issue. Is there any indication that switching to a push event would reduce the chance of this occurring?

despairblue commented 4 years ago

We have had this issue with push events too. We're considering using the old v1 instead.

On Wed, Jul 22, 2020, 19:58 Jesse Newland notifications@github.com wrote:

After taking a closer look at the logs from actions/checkout@v2 on the most recent offending run, I see that the merge commit referenced refers to the SHA one step back in my (entirely linear) commit history. This commit that had been pushed 3 minutes prior to the commit I expected to be checked out. This seems to lend credence to the replication lag theory mentioned above. I'd be happy to share additional details privately if that'd help in any way.

This is a pretty gnarly issue. Is there any indication that switching to a push event would reduce the chance of this occurring?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/actions/checkout/issues/237#issuecomment-662599568, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAHCO6NFIOHF75CLXY6PMADR44SF7ANCNFSM4M3TKTCA .

zoispag commented 4 years ago

A colleague identified that something was wrong with this today too.

It happened twice in the last 11 builds and we noticed merely by accident, because the test were supposed to pass and failed again.

For most repos of my organization, switching to @v1 is ok. But not for our main repo, because v1 takes about 2 minutes to finish the checkout, making v2 the only viable option.

jnewland commented 4 years ago

The telemetry that GitHub is collecting on this issue (added in https://github.com/actions/checkout/pull/253) appears to look for a situation where the SHA included in the after property of the pull_request.synchronize workflow event is not referenced in the merge commit being checked out. I added a small guard step to my workflow to fail execution in this case:

      # Workaround https://github.com/actions/checkout/issues/237
    - name: Verify checkout is up to date with the PR
      run: |
        git show | grep ${{ github.event.after }}

On a hunch, I was able to trigger the "bad behavior" and validate at least part of this failsafe functionality by instructing the checkout action to fetch the merge_commit_sha from the workflow event payload:

    - name: Checkout
       uses: actions/checkout@v2
       with:
         # DANGER!!!! this might check out a stale commit
         ref: ${{ github.event.pull_request.merge_commit_sha }}

      # Workaround https://github.com/actions/checkout/issues/237
    - name: Verify checkout is up to date with the PR
      run: |
        git show | grep ${{ github.event.after }}

Sure enough, that workflow failed, having checked out a merge commit that was a few minutes old instead of the most recent one. https://docs.github.com/en/rest/reference/pulls#get-a-pull-request describes how this merge commit is created: one can understand a situation in which that process hasn't completed yet by the time that actions' receives a payload is delivered: this page in GitHub's API docs describes workarounds for a similar situation and specifically calls out that these refs are updated with a background job and should not be relied to produce up to date content. Given this experiment, my hunch is now that backlogs in async work queues are contributing to this situation more than replication delay.

To be clear, I can't find any documentation for the after property that I'm using in this guard, so I can't be entirely sure that it's also not susceptible to whatever lag is contributing to this situation. I'll be sure to update this issue again if I experience any failures with this guard in place (update: it works). I'd also be interested to know how the telemetry GitHub added is looking from their end if anyone would be able to share!

andymckay commented 4 years ago

Just to let you all know that we do know about this issue and we have now been able to reproduce it.

Unfortunately its not a simple fix to change in our architecture, but we recognise its an issue that causes problems and we are working on it. I can't share a time line when it will be fixed yet, but we will update this issue when we know more.

vatz88 commented 4 years ago

I was wondering if it's long fix, can starter-workflows use v1 by default? Since lot people use these templates, some may never figure out why the builds are failing. And this issue is quite reproducible.

jcornaz commented 4 years ago

The workflow payload contained the wrong merge commit.

@ericsciple It looks like the event payload is correct. Or at least other third party actions running in the same workflow instance get the commit hash correctly:

image

jcornaz commented 4 years ago

I see many people are rolling back to v1, is that helpful? Is v1 exempt of the bug?

@andymckay, @iheanyi, @ericsciple, this critical issue has been reported for a while already. Do we have any news?

FabianKoestring commented 4 years ago

I see many people are rolling back to v1, is that helpful? Is v1 exempt of the bug?

@andymckay, @iheanyi, @ericsciple, this critical issue has been reported for a while already. Do we have any news?

V1 is working? We have exactly the same problem with V2.

andymckay commented 4 years ago

We rolled out a fix for this last night and we've seen the number of cases reported through the telemetry added by @ericsciple to almost zero since then. One occurred this morning and we'll see if we can track that one down. However we are hoping this is now fixed.

Please let us know if you see this re-occuring.

Update: a change rolled out for the one that occurred this morning as well.

hashtagchris commented 4 years ago

So far it looks like this issue is fixed for all new workflow runs created after 2020-07-31 13:13:00.

Older workflow runs affected by this bug will use the wrong merge commit on re-run as well. To queue a new run, you can typically close and re-open a PR or push more changes to it.

iheanyi commented 4 years ago

Thanks @hashtagchris! I'm going to close this out now, really appreciate all the work folks put into fixing this!

MoSattler commented 4 years ago

It still does not work for me.

name: CI

on:
  push:
    branches: [master]
  pull_request:
    branches: [master]

jobs:
  test:
    runs-on: ubuntu-latest

    steps:
      - uses: actions/checkout@v2
      - uses: actions/setup-node@v1
        with:
          node-version: 12.18.3

      - name: install dependencies
        run: npm ci

      - name: test
        run: npm run test:coverage

      - name: Upload coverage to Codecov
        uses: codecov/codecov-action@v1
        with:
          token: ${{ secrets.CODECOV_TOKEN }}
          fail_ci_if_error: true

I had to change the checkout to v1 for codecov to report properly

yurishkuro commented 3 years ago

@andymckay it would be good to update this issue with the exact version number of checkout@v2 where the issue was fixed.

Mubelotix commented 2 years ago

Please let us know if you see this re-occuring.

@andymckay This is re-occuring. My whole CI is completely broken and corrupted. Help

Edit: I was wrong it was another issue

andymckay commented 2 years ago

@Mubelotix I would recommend you contact customer support here: https://support.github.com/

Or you can open a new issue on this repository with as much information as you've got so the team can take a look.