Open serathius opened 5 months ago
The flaky detection is meant to detect tests that fails sometimes, not one-off failures. this test fails about 2% of the time. @serathius Do you think 2% threshold is reasonable?
Hmm, not sure. The 16 % flakiness on main branch seems not great https://github.com/etcd-io/etcd/actions/runs/8584643093/job/23525115058.
I think the 16 % flakiness on main branch includes all the workflows on a PR. I am seeing a lot of flakiness wrt arm64.
I think the 16 % flakiness on main branch includes all the workflows on a PR. I am seeing a lot of flakiness wrt arm64.
Raised a flake issue for TestMemberAdd
e2e on arm64
and amd64
. I have seen it fail a few times in GitHub actions for arm64
and there are also instances in TestGrid for amd64
on prow.
The flaky detection is meant to detect tests that fails sometimes, not one-off failures. this test fails about 2% of the time. @serathius Do you think 2% threshold is reasonable?
Maybe we could improve on visibility. What was surprising for me was fact that the tool didn't mention any flakes. Could we maybe log flakes below 2%, with note that it's too low to file an issue?
The reports are very nice.
My suggestions:
To go into more detail, lets define a measure of bad contributor experience due to CI, something like time wasted on CI to merge PR. I would call this TTM - time to merge
, a reflection of how long it takes to test a PR and flakiness of those test. I would expect TTM to equal something like max(TSDi^(1+TSFi) for each i)
where TSDi - duration of test suite i
, TSFi - flakiness of test suite i
. Because retries can be done on test suite level, we need to count it per suite. If we set a target for TTM, different suites might have different acceptable flakiness as it's easier and faster to retry 1 minute test, than 30 minute one. Of course it assumes that notice failure and retry is zero which is a simplification. However this is high level my mental model of the problem. If we include the TTR - time to retry
the TTM=max(TSDi^TSFi+(TSDi+TTR)^TSFi for each i)
What would you like to be added?
https://github.com/etcd-io/etcd/actions/runs/8584759086/job/23525383783
cc @siyuanfoundation
Why is this needed?
Last run https://github.com/etcd-io/etcd/actions/runs/8584759086/job/23525383783 on April 7th, didn't detect a flake on April 4th.