Open mhofman opened 2 years ago
To work around this issue #8253 kept the concurrent cancellation config introduced during our stint with GitHub merge queues. However this is also inefficient in some cases, e.g. when a PR label is changed while the integration tests are still running.
@mhofman i've tried replicating this issue by removing the concurrency
key we have in integration.yml
This results in the use of check_and_cancel
job in pre-check-integration.yml
. but it seems to working as expected.
Here is the first job which completes: https://github.com/Agoric/agoric-sdk/actions/runs/10303452336
And here is the job that runs after it: https://github.com/Agoric/agoric-sdk/actions/runs/10303457883
It waits for the result as expected.
On a side note: Is our goal with this job to save time or resources? if our goal is to save resources spent on github actions, wouldnt it be more efficient to have the previous job cancel? since two jobs running side by side will use up more github action minutes
but it seems to working as expected.
Oh that's wonderful, it looks like it doesn't re-run the job! Maybe it was a fluke. We should keep an eye on this (do we have a graph showing the runtime of the integration checks we could monitor?)
On a side note: Is our goal with this job to save time or resources?
The goal was to save time at the expense of resources. Integration jobs are long, and let's say you're into 30 minutes of a 45 minute job, apply a label unrelated (e.g. a classification label), currently we would cancel the 30 min deep job and restart from scratch. The goal was to keep that going if the PR content itself didn't change. I believe this would also happen in a much more common case of updating a PR when entering the merge queue:
automerge
label
Describe the bug
Because integration tests are label triggered, it's possible for these actions to end up running concurrently on the exact same commit. To avoid duplicated work, we use a 3rd party action to detect duplicate jobs, and wait for the results of pending jobs on identical commits. If the previous check succeeds, the actual job should be skipped and outcome carried over.
For some reason the latter part seem to no longer work correctly. While we wait until a concurrent job completes, we no longer skip if successful, and end up re-running the job, creating a worse case scenario of checking the same thing multiple times sequentially.
To Reproduce
See jobs:
Expected behavior
Correctly skip job if concurrent check is successful.
Platform Environment