argoproj / argo-workflows

Workflow Engine for Kubernetes
https://argo-workflows.readthedocs.io/
Apache License 2.0
15.11k stars 3.21k forks source link

Run Go tests nightly with race detector enabled #13661

Open agilgur5 opened 2 months ago

agilgur5 commented 2 months ago

Summary

There's a variety of data races in the code that can be detected via Go's race detector, so we should enable that in a nightly test run. It can substantially increase execution time and memory, so running it per PR may not be feasible.

Follow-up to https://github.com/argoproj/argo-workflows/issues/13637, https://github.com/argoproj/argo-workflows/issues/10807#issuecomment-2090959022, and other places where contributors have talked about enabling the race detector in CI somewhere

Use Cases

To detect certain data races ahead of time in CI. We also do have some test flakes that occur due to race conditions, and the race detector should make those easier to root cause and hopefully reduce flakes like that in general.

The frequency as nightly is to make it easier to narrow down regressions. That doesn't mean we'd check it nightly per se, but with a nightly run we could go back through each night's history at least to understand which PR caused a regression.

sidebar: race condition test suite

I've also mentioned (https://github.com/argoproj/argo-workflows/pull/13102#issuecomment-2156719671 etc) potentially making a race condition suite where we intentionally have racey Workflows and assert correct behavior between them, but I don't yet have a framework for that in my head

Implementation Details

We'd have to create a separate GHA workflow similar to ci-build.yaml but that runs on a schedule instead. It may very well be a good idea to consolidate some of the duplication between these two into "reusable workflows" that both use.

We can then have a parameter in that reusable workflow that enables the race detector via an environment variable passed to the Makefile which sets the -race flag on existence

stretch goal - notifications

It would also be good to note a nightly failure in an issue or in Slack or something, but that's a bit easier said than done as we don't currently have any implementations of those integrations that we could re-use.


Message from the maintainers:

Love this feature request? Give it a 👍. We prioritise the proposals with the most 👍.