To detect certain data races ahead of time in CI. We also do have some test flakes that occur due to race conditions, and the race detector should make those easier to root cause and hopefully reduce flakes like that in general.
The frequency as nightly is to make it easier to narrow down regressions. That doesn't mean we'd check it nightly per se, but with a nightly run we could go back through each night's history at least to understand which PR caused a regression.
We'd have to create a separate GHA workflow similar to ci-build.yaml but that runs on a schedule instead.
It may very well be a good idea to consolidate some of the duplication between these two into "reusable workflows" that both use.
We can then have a parameter in that reusable workflow that enables the race detector via an environment variable passed to the Makefile which sets the -race flag on existence
stretch goal - notifications
It would also be good to note a nightly failure in an issue or in Slack or something, but that's a bit easier said than done as we don't currently have any implementations of those integrations that we could re-use.
Message from the maintainers:
Love this feature request? Give it a 👍. We prioritise the proposals with the most 👍.
Summary
There's a variety of data races in the code that can be detected via Go's race detector, so we should enable that in a nightly test run. It can substantially increase execution time and memory, so running it per PR may not be feasible.
Follow-up to https://github.com/argoproj/argo-workflows/issues/13637, https://github.com/argoproj/argo-workflows/issues/10807#issuecomment-2090959022, and other places where contributors have talked about enabling the race detector in CI somewhere
Use Cases
To detect certain data races ahead of time in CI. We also do have some test flakes that occur due to race conditions, and the race detector should make those easier to root cause and hopefully reduce flakes like that in general.
The frequency as nightly is to make it easier to narrow down regressions. That doesn't mean we'd check it nightly per se, but with a nightly run we could go back through each night's history at least to understand which PR caused a regression.
sidebar: race condition test suite
I've also mentioned (https://github.com/argoproj/argo-workflows/pull/13102#issuecomment-2156719671 etc) potentially making a race condition suite where we intentionally have racey Workflows and assert correct behavior between them, but I don't yet have a framework for that in my head
Implementation Details
We'd have to create a separate GHA workflow similar to
ci-build.yaml
but that runs on a schedule instead. It may very well be a good idea to consolidate some of the duplication between these two into "reusable workflows" that both use.We can then have a parameter in that reusable workflow that enables the race detector via an environment variable passed to the
Makefile
which sets the-race
flag on existencestretch goal - notifications
It would also be good to note a nightly failure in an issue or in Slack or something, but that's a bit easier said than done as we don't currently have any implementations of those integrations that we could re-use.
Message from the maintainers:
Love this feature request? Give it a 👍. We prioritise the proposals with the most 👍.