Open serathius opened 3 years ago
This sounds like a pretty interesting thing and also like a thing that alleviates a lot of pain and improves developer experience !
I was able to get a basic bash script using GitHub GraphQL API - https://github.com/karuppiah7890/issues-info/blob/main/etcd-io/etcd/issue-13167/find-flaky-tests-data.sh . It gives data like this - https://github.com/karuppiah7890/issues-info/blob/main/etcd-io/etcd/issue-13167/commit-and-check-data.json
I'm able to get the number of successes and we can get failures too. Given total (for example 100) and any one of those (successes / failures), we get the other value too
Great! Would you be interested in sending PR that adds it to etcd scripts
?
Sure @serathius ! I was also wondering if I should try out a golang script too, so anyone can run it with just "go run" or similar on any platform. No need to worry about OS, bash shell being available, other tools being available etc. What do you think?
Letting everyone to run it is a good initiative, but on the other hand long term we should just automate it. Most scripts are already written in bash and I don't think there is any need to invest in this script too much. It should be simple enough (2-3 commands) that it could be replaced when needed.
I think it would make sense revisit those improvements when we have established whole process and automated it.
Makes sense @serathius ! 👍 I'll raise the PR and we can discuss more about the bash script as part of the PR
This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 21 days if no further activity occurs. Thank you for your contributions.
commenting to avoid closing of issue
This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 21 days if no further activity occurs. Thank you for your contributions.
I still see flakiness when unit tests are run.
I hacked together a tool for finding/tracking/fixing flakes the other day: https://github.com/endocrimes/etcd-test-analyzer
Because it parses all of the test results from every run in a given time period, it makes it relatively easy to modify to ask new questions in place, but definitely isn't a tool that is widely useful in its current form.
Status update, running ./scripts/measure-test-flakiness.sh
gave me:
Commit status failure percentage is - 23 %
So on last 100 merged commits we got 24 test failures. Excluding 7 coverage failures (not blocking merge) and 2 recent failures due to post merge bug https://github.com/etcd-io/etcd/pull/14101, we get 14% flakiness.
Going down from 50% to 14% is great result!! Thanks everyone who helped.
Looking into failures from last 100 runs (excluding coverage and known issues) we get failures in:
TestDowngradeUpgradeClusterOf3
(example) - @serathiusBLACKHOLE_PEER_PORT_TX_RX_LEADER
(example)NO_FAIL_WITH_NO_STRESS_FOR_LIVENESS
(example)TestLeasingReconnectOwnerConsistency
(example)TestWatchCancelOnServer
(example)TestDropReadUnderNetworkPartition
(example) (possible goroutine leak in previous test)TestBalancerUnderNetworkPartitionTxn
TestAuthority
(example)TestLeasingReconnectNonOwnerGet
(example)TestMaxLearnerInCluster
(example)DELAY_PEER_PORT_TX_RX_LEADER_UNTIL_TRIGGER_SNAPSHOT
(example)As there are a lot of tests would be great to get some help. Please let me know if you are interested in tackling one of the tests listed.
Status: 28% flakiness https://github.com/etcd-io/etcd/actions/runs/3075242479/jobs/4968492878
Thanks for raising this issue. It is really annoying for any contributors to etcd that unrelated tests failed.
I can take one TestDowngradeUpgradeClusterOf3
because I just faced in https://github.com/etcd-io/etcd/pull/14331. It's also a good opportunity to learn how downgrade works as well.
Track this in https://github.com/etcd-io/etcd/issues/14540
I noticed recent increase in flakes (at least in my PRs). From https://github.com/etcd-io/etcd/actions/runs/4394774437/jobs/7696017126 we see 26% of flakiness.
Loved recent initiative by @chaochn47 to use tools developed by @endocrimes in https://github.com/etcd-io/etcd/pull/15501.
It would be great to integrate them into https://github.com/etcd-io/etcd/actions/workflows/measure-test-flakiness.yaml @chaochn47 would you be interested in this?
Yeah, I can help add to the existing workflow. ETA next Monday
Hi, I'd like to work on this!
Thanks @nitishfy for your interest. The issue was created some time ago so not everything is up to date, however high level goals remained relevant. We want to improve our visibility of test flakes so we can fix them more effectively.
For the original plan, we have instrumented etcd e2e tests to export JUnit reports, @endocrimes and @karuppiah7890 implemented some custom scripts that would analyse them. This approach allowed us to start reporting and manually creating issues to fix flakes.
One thing we can do better is to avoid developing our own scripting, etcd community is not very big, so we want to avoid spreading too thin maintaining too many custom tools. With introduction of SIG-etcd we now have a option to benefit from whole ecosystem of tools built by Kubernetes community. We should do that.
One example of such tool is testgrid, it's a test result visualization tool that uses the same JUnit reports to create a grid showing which tests passed and which failed. It makes it really easy to track flakes. For example https://testgrid.k8s.io/sig-etcd-periodics#ci-etcd-e2e-amd64
I think we should work more on integrating with K8s tools, this first requires migrating etcd testing to Prow, K8s CI tool. This work can be tracked in https://github.com/kubernetes/k8s.io/issues/6102.
In the meantime we could improve ensure that all etcd tests generate a Junit report, that can be later used.
Looking at github workflows only in https://github.com/etcd-io/etcd/blob/main/.github/workflows/tests-template.yaml
We set JUNIT_REPORT_DIR
and export junit files https://github.com/etcd-io/etcd/blob/11ff2644f2378e80a461d7dacfe3ad151c37f26e/.github/workflows/tests-template.yaml#L69-L73 we should look into adding it to more test scenarios.
If we look into tests results since we migrated Github Actions commits on main branch we get:
Where failure/success is based on green check vs red cross under commit message (commits without them means that they were not tested as they were multiple commits in one PR).
Those are all test failures on main branch, so after a PR passed tests and was approved. We can use those failures to calculate chance of any PR failing to pass tests just due to test flaking.
Having flakyness ratio of over 50% means that average PR needs to be run 2 times, but number of failures in sequences may be much much longer, 3-5 failures in row is not something uncommon. This can be frustrating especially to new contributors, as there is no easy way to retrigger tests (need to do an empty commit amend and push).
Proposal
Etcd community should set on a test flakyness target, measure it and establish a process to fix flaky tests.
For start I would propose to target a 10% failure rate for whole test suite. It should be reachable by fixing only couple of tests as from last runs we got 22% (7 out of last 32). Measuring flakyness could start from something simple, like for example running a script once a week that checks last 100 test results. If the measured flakyness is over our target, we should identify most flaky tests, create issues for them and encourage community to fix them.
For couple of first runs we could depend on executing the scripts manualy, but we should plan to automate them.
TODO:
cc @hexfusion @Rajalakshmi-Girish