Closed jepler closed 4 years ago
Thanks! I'll loop in some folks.
Indeed something looks wrong.
From the mpy-cross-mac
check run, the step "Dump GitHub context" reveals some key info:
{
[...]
"ref": "refs/pull/2831/merge",
"sha": "e7eef6f0ca41e82a3082df79b7abbd6a369433a6",
"repository": "adafruit/circuitpython",
"event_name": "pull_request",
[...]
"event": {
"action": "synchronize",
"after": "e5be728c1c8a3a455b7095c9dab9e758c5016191",
"before": "6dfe9ab606352b36f8d295b250ac7118c459618b",
"number": 2831,
[...]
Yet appears to contradict the git log -1
output at the end of the step "Run actions/checkout@v2":
/usr/local/bin/git log -1
commit e7eef6f0ca41e82a3082df79b7abbd6a369433a6
Merge: 2d7cf4b79 6dfe9ab60
Author: Jeff Epler <jepler@gmail.com>
Date: Wed Apr 29 19:23:50 2020 +0000
Merge 6dfe9ab606352b36f8d295b250ac7118c459618b into 2d7cf4b792afa1b6755b9575b84cae4b65ae7ab9
It looks like ${{ github.sha }}
is the old merge commit.
[edit]: The commit message should read: Merge e5be728c1c8a3a455b7095c9dab9e758c5016191 into 2d7cf4b792afa1b6755b9575b84cae4b65ae7ab9
(since e5be7 is the "after" sha)
@andymckay have you heard of any similar occurrences?
The workflow payload contained the wrong merge commit.
👋🏾 Will be taking a look into this today and tomorrow.
Thanks @iheanyi for digging into this! We dug into this a bit and are unable to reproduce. This may be due to replication lag (received previous merge commit). Going forward we'll add more telemetry to the checkout step. So if this happens again we will be able to detect from telemetry.
Hi, I just ran in to this same issue here: https://github.com/PhotonVision/photonvision/pull/18/commits/f2305bb501ec200d022a1bac4bac8ee316f68e54# The SHA for the commit is f2305bb501ec200d022a1bac4bac8ee316f68e54 but the workflow ran with commit 2e69b3e24b82faefc3702fc946834c61b31d8c4
https://github.com/PhotonVision/photonvision/pull/18/checks?check_run_id=844516246 Hopefully this can somehow help, having another known occurence
Same happens when using with ad-m/github-push-action
, related:
My configuration is like:
Push:
if: github.event_name == 'pull_request'
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
with:
persist-credentials: false
# Otherwise, you will failed to push refs to dest repo.
fetch-depth: 0
ref: refs/heads/${{ github.head_ref }} # Or: ${{ github.event.pull_request.head.ref }}
- uses: actions/download-artifact@v2
name: MyArtifact
- name: Commit files
run: |
git config --local core.autocrlf false
git config --local user.email $git_email
git config --local user.name "${{ github.actor }}"
git add . && git add --renormalize .
git pull origin "${{ github.ref }}" --autostash --rebase -Xours
git commit -am "${{ github.workflow }}"
env:
git_email: "${{ github.actor }}@users.noreply.github.com"
- name: Push changes (not PR)
if: github.event_name != 'pull_request'
uses: ad-m/github-push-action@master
with:
branch: ${{ github.ref }}
github_token: ${{ secrets.GITHUB_TOKEN }}
- name: Push changes (PR)
if: github.event_name == 'pull_request'
uses: ad-m/github-push-action@master
with:
branch: refs/heads/${{ github.head_ref }}
github_token: ${{ secrets.GITHUB_TOKEN }}
Despite I'm checking out the right HEAD, I see some extra merge commits:
Here are more details based on the build 866142019.
I'm pushing my changes using ad-m/github-push-action@master
as per:
Push changes (PR) >
directory: .
Run ad-m/github-push-action@master
with:
branch: refs/heads/v1.079-dev
github_token: ***
directory: .
Started: bash /home/runner/work/_actions/ad-m/github-push-action/master/start.sh
Push to branch refs/heads/v1.079-dev
To https://github.com/EA31337/EA31337.git
7173d15..7efcedf HEAD -> v1.079-dev
So my HEAD changed from 7173d15
to 7efcedf
(which is clearly visible above), however the checkout action (I assume) have added extra merge commit BETWEEN them, so my list of 3 last commits (instead of 2) looks like:
7173d15
7173d15
into e24c723
" SHA:8b7e516
(unwanted merge commit appearing out of blue)7efcedf
Where e24c723
is origin/master
merge commit, whereas I've requested to checkout PR HEAD (ref: refs/heads/${{ github.head_ref }}
).
It happens to me, too.
It runs the action on the project's wrong commit
Go to https://github.com/PyAr/CDPedia/runs/880106832
An you will see that the 5d8abdc commit number isn't in the log:
/usr/bin/git log -1 commit 5e52271eb4cfdbe501fb96f715efaf10b88d176c Author: marian-vignau mavignau@gmail.com Date: Fri Jul 17 00:38:32 2020 +0000
Merge 2b611cc85e9500c0d94f1e01eaf0bcaf057d7119 into 7d813b430b1b41968577387df75c7ba98814ea65
And it happens again in https://github.com/PyAr/CDPedia/runs/871872757 The logs doesn't mention the commit number ed80113. In the log
/usr/bin/git log -1 commit a4f842e8cd3b925513004b2adf64b626c3faa6cc Author: marian-vignau mavignau@gmail.com Date: Wed Jul 15 00:56:28 2020 +0000
Merge 6ae6c1de90ef2bea523713ab8ac494a4354bbbe1 into f4899180fc5a4b2308cf5e1e092e3f58955db519
Thank you!
I've experienced this twice this week, once just a few minutes ago. I pushed a commit to a pull request that edited the workflow that runs on pull_request.synchronize
and observed the Action run for that SHA fail due to code that was removed in the commit.
After taking a closer look at the logs from actions/checkout@v2
on the most recent offending run, I see that the merge commit referenced refers to the SHA one step back in my (entirely linear) commit history. This commit that had been pushed 3 minutes prior to the commit I expected to be checked out. This seems to lend credence to the replication lag theory mentioned above. I'd be happy to share additional details privately if that'd help in any way.
This is a pretty gnarly issue. Is there any indication that switching to a push event would reduce the chance of this occurring?
We have had this issue with push events too. We're considering using the old v1 instead.
On Wed, Jul 22, 2020, 19:58 Jesse Newland notifications@github.com wrote:
After taking a closer look at the logs from actions/checkout@v2 on the most recent offending run, I see that the merge commit referenced refers to the SHA one step back in my (entirely linear) commit history. This commit that had been pushed 3 minutes prior to the commit I expected to be checked out. This seems to lend credence to the replication lag theory mentioned above. I'd be happy to share additional details privately if that'd help in any way.
This is a pretty gnarly issue. Is there any indication that switching to a push event would reduce the chance of this occurring?
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/actions/checkout/issues/237#issuecomment-662599568, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAHCO6NFIOHF75CLXY6PMADR44SF7ANCNFSM4M3TKTCA .
A colleague identified that something was wrong with this today too.
It happened twice in the last 11 builds and we noticed merely by accident, because the test were supposed to pass and failed again.
7814315
: 7814315549ad5c6f2f1ffb4bda8bc26531855122 into 831087b15fa3e7382284fe38d0b6dc292de107d61fef0cf
: 7814315549ad5c6f2f1ffb4bda8bc26531855122 into 831087b15fa3e7382284fe38d0b6dc292de107d6cc253c7
: cc253c78758bfe6d8d82919d66bd4ad8ccdfe323 into 831087b15fa3e7382284fe38d0b6dc292de107d6c415104
: c41510437859b59462fc45d7bef156d6e15b7d7e into 831087b15fa3e7382284fe38d0b6dc292de107d60c33c32
: c41510437859b59462fc45d7bef156d6e15b7d7e into 831087b15fa3e7382284fe38d0b6dc292de107d6a268766
: a268766c8d65874507c5ff73320905743182ac07 into 831087b15fa3e7382284fe38d0b6dc292de107d6c055bc1
: c055bc14b36d36c11c58fafa97d9d43cdc9addbc into 831087b15fa3e7382284fe38d0b6dc292de107d678209ff
: 78209ff122d842f646c04c0a8f1cd10e37016734 into 831087b15fa3e7382284fe38d0b6dc292de107d69612f8b
: 9612f8b55cfc7244e3e97b33100b3f8be3e24ad8 into 831087b15fa3e7382284fe38d0b6dc292de107d6ab239e2
: ab239e25aa2d37b133fc355c64ab4db830973ca7 into 831087b15fa3e7382284fe38d0b6dc292de107d62daa98d
: 2daa98d3a0a0bb4d1d7e673e9bc26346bc2e550c into 831087b15fa3e7382284fe38d0b6dc292de107d6For most repos of my organization, switching to @v1
is ok. But not for our main repo, because v1
takes about 2 minutes to finish the checkout, making v2
the only viable option.
The telemetry that GitHub is collecting on this issue (added in https://github.com/actions/checkout/pull/253) appears to look for a situation where the SHA included in the after
property of the pull_request.synchronize
workflow event is not referenced in the merge commit being checked out. I added a small guard step to my workflow to fail execution in this case:
# Workaround https://github.com/actions/checkout/issues/237
- name: Verify checkout is up to date with the PR
run: |
git show | grep ${{ github.event.after }}
On a hunch, I was able to trigger the "bad behavior" and validate at least part of this failsafe functionality by instructing the checkout action to fetch the merge_commit_sha
from the workflow event payload:
- name: Checkout
uses: actions/checkout@v2
with:
# DANGER!!!! this might check out a stale commit
ref: ${{ github.event.pull_request.merge_commit_sha }}
# Workaround https://github.com/actions/checkout/issues/237
- name: Verify checkout is up to date with the PR
run: |
git show | grep ${{ github.event.after }}
Sure enough, that workflow failed, having checked out a merge commit that was a few minutes old instead of the most recent one. https://docs.github.com/en/rest/reference/pulls#get-a-pull-request describes how this merge commit is created: one can understand a situation in which that process hasn't completed yet by the time that actions' receives a payload is delivered: this page in GitHub's API docs describes workarounds for a similar situation and specifically calls out that these refs are updated with a background job and should not be relied to produce up to date content. Given this experiment, my hunch is now that backlogs in async work queues are contributing to this situation more than replication delay.
To be clear, I can't find any documentation for the after
property that I'm using in this guard, so I can't be entirely sure that it's also not susceptible to whatever lag is contributing to this situation. I'll be sure to update this issue again if I experience any failures with this guard in place (update: it works). I'd also be interested to know how the telemetry GitHub added is looking from their end if anyone would be able to share!
Just to let you all know that we do know about this issue and we have now been able to reproduce it.
Unfortunately its not a simple fix to change in our architecture, but we recognise its an issue that causes problems and we are working on it. I can't share a time line when it will be fixed yet, but we will update this issue when we know more.
I was wondering if it's long fix, can starter-workflows use v1
by default? Since lot people use these templates, some may never figure out why the builds are failing. And this issue is quite reproducible.
The workflow payload contained the wrong merge commit.
@ericsciple It looks like the event payload is correct. Or at least other third party actions running in the same workflow instance get the commit hash correctly:
I see many people are rolling back to v1
, is that helpful? Is v1
exempt of the bug?
@andymckay, @iheanyi, @ericsciple, this critical issue has been reported for a while already. Do we have any news?
I see many people are rolling back to
v1
, is that helpful? Isv1
exempt of the bug?@andymckay, @iheanyi, @ericsciple, this critical issue has been reported for a while already. Do we have any news?
V1 is working? We have exactly the same problem with V2.
We rolled out a fix for this last night and we've seen the number of cases reported through the telemetry added by @ericsciple to almost zero since then. One occurred this morning and we'll see if we can track that one down. However we are hoping this is now fixed.
Please let us know if you see this re-occuring.
Update: a change rolled out for the one that occurred this morning as well.
So far it looks like this issue is fixed for all new workflow runs created after 2020-07-31 13:13:00.
Older workflow runs affected by this bug will use the wrong merge commit on re-run as well. To queue a new run, you can typically close and re-open a PR or push more changes to it.
Thanks @hashtagchris! I'm going to close this out now, really appreciate all the work folks put into fixing this!
It still does not work for me.
name: CI
on:
push:
branches: [master]
pull_request:
branches: [master]
jobs:
test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- uses: actions/setup-node@v1
with:
node-version: 12.18.3
- name: install dependencies
run: npm ci
- name: test
run: npm run test:coverage
- name: Upload coverage to Codecov
uses: codecov/codecov-action@v1
with:
token: ${{ secrets.CODECOV_TOKEN }}
fail_ci_if_error: true
I had to change the checkout to v1 for codecov to report properly
@andymckay it would be good to update this issue with the exact version number of checkout@v2 where the issue was fixed.
Please let us know if you see this re-occuring.
@andymckay This is re-occuring. My whole CI is completely broken and corrupted. Help
Edit: I was wrong it was another issue
@Mubelotix I would recommend you contact customer support here: https://support.github.com/
Or you can open a new issue on this repository with as much information as you've got so the team can take a look.
Recently, I had a PR https://github.com/adafruit/circuitpython/pull/2831 which checked out green, so a fellow developer merged it. It broke the build.
In subsequent investigation, I noticed that while the linked Actions run https://github.com/adafruit/circuitpython/runs/633447824?check_suite_focus=true shows that it was building the same ref as the one that was merged, down in the logs you find something different.
Under the step "Run actions/checkout@v2", "Checking out the ref",
6dfe9ab represents another revision of the code that was apparently replaced by force-push with the later value 75be728: