Open Atemu opened 2 days ago
Agreed this is a problem. I recently introduced the get-merge-commit.sh
, which fixes this problem and is already used for some workflows. It still needs to get applied to them all though (or rather, the ones using pull_request_target
and refs/pull/.../merge
)
Issue description
I've regularly observed PRs where a bunch of GHA checks fail with:
Fetching the PR should basically never fail unless GH is having a moment and in that case we shouldn't error out but rather retry (ideally using exponential back-off).
Our current fetch step actually does retry but it only attempts twice it and only waits like 12+-2 seconds each time it seems.
Timely CI completion is a lot less critical than not sending the PR author a bunch of confusing CI failure notifications IMHO, so we should retry for a lot longer that that.
I think a timeout of 30min would be appropriate because, after that long, an error that is more critical than a temporary hiccup is likely to have occured and it's okay to fail loudly. It's also not an unreasonable amount of time in Nixpkgs PR lifecycle time scales IMHO; you wouldn't expect anything noteworthy to have happened 30min after opening a PR. (You would expect the basic checks to have completed under normal circumstances of course but actually interesting interactions (ofBorg, reviews) are likely to take much longer.)
cc @infinisil