bazelbuild / bazel

a fast, scalable, multi-language and extensible build system
https://bazel.build
Apache License 2.0
22.99k stars 4.03k forks source link

Introduce a flag to retry test failures only if TEST_INFRASTRUCTURE_FAILURE_FILE is written #13588

Open tomrenn opened 3 years ago

tomrenn commented 3 years ago

Description of the problem / feature request:

Google's internal version of bazel, has a flag called --treat_test_infrastructure_failures_as_flaky, which retries tests up to 3 times (similar to Flaky=True). This only happens if the test output contains a file in the $TEST_INFRASTRUCTURE_FAILURE_FILE as defined in the bazel docs.

Feature requests: what underlying problem are you trying to solve with this feature?

The feature is trying to improve tests which may have flaky infrastructure, but are generally reliable otherwise. For example, we have some tests which relying on starting an emulator for testing. If this emulator fails to start, we'd like to treat that as an infrastructure failure and indicate the test should be retried.

This is an improvement over using Flaky=True on the test target, because there can be other test failures we would prefer not to be retried. --treat_test_infrastructure_failures_as_flaky would give tests the ability to have flaky retries only during certain [infrastructure] failures.

github-actions[bot] commented 1 year ago

Thank you for contributing to the Bazel repository! This issue has been marked as stale since it has not had any activity in the last 1+ years. It will be closed in the next 14 days unless any other activity occurs or one of the following labels is added: "not stale", "awaiting-bazeler". Please reach out to the triage team (@bazelbuild/triage) if you think this issue is still relevant or you are interested in getting the issue resolved.

github-actions[bot] commented 1 year ago

This issue has been automatically closed due to inactivity. If you're still interested in pursuing this, please reach out to the triage team (@bazelbuild/triage). Thanks!

joeljeske commented 1 year ago

This still seems valuable, and a great addition to the bazel test story. @bazelbuild/triage

EdSchouten commented 1 year ago

@tomrenn @meisterT Is adding support for this merely a matter of taking the internal implementation of --treat_test_infrastructure_failures_as_flaky and adding it to the public version of Bazel, or is there more to it? In case it's the former, would Google be interested in dropping the code for that?

meisterT commented 1 year ago

cc @lberki @haxorz

@tomrenn do you want to take this one yourself?

EdSchouten commented 1 year ago

Friendly ping!

tomrenn commented 1 year ago

Yes, this would be taking the internal implementation and making it available to Bazel as well. I think it should be a straightforward change but I won't have time to do this before 7.0.0 (in October). I can try to do this before the end of the year but no promises.

acecilia commented 2 weeks ago

@tomrenn Any updates about this? 🙏 (I guess only Google employees could do this, as the internal implementation is not public?)

The other option would be to implement it from scratch