Open MrFlashAccount opened 1 week ago
We should rather prefer https://github.com/enso-org/enso/pull/11602. Retries are not perfect, as they still make builds longer + 5 may be still too few (Jaroslav reported getting error 8 times in a row).
We should rather prefer https://github.com/enso-org/enso/pull/11602. Retries are not perfect, as they still make builds longer + 5 may be still too few (Jaroslav reported getting error 8 times in a row).
Yeah, it's a draft and more like a playground to test ideas. Though retries are abslolutely necessary
Though retries are abslolutely necessary
Why? I think fixing the tests is necessary, not retries :)
Retries is a common practice for tests on CI as there are always unavoidable fluctuation due to concurrent access to resources and many other instabilities. Adjusting retries help to reduce the amount of cases when tests fail due to false-positive scenarios.
We do retries though, as you mentioned, even up to 8 times, manually. Why wouldn't we do that automatically?
Retries is a common practice for tests on CI as there are always unavoidable fluctuation due to concurrent access to resources and many other instabilities. Adjusting retries help to reduce the amount of cases when tests fail due to false-positive scenarios.
The same fluctuations may happen on user's machines, and they will blame us for that.
We do retries though, as you mentioned, even up to 8 times, manually. Why wouldn't we do that automatically?
We should not do. If tests require 8 retries, then they should be blocked and fixed.
In GUI playwright tests, we had several cases of flaky tests. After a closer look it, it was never due to CI instability, but
Retries will make us blind to those issues. That's why in GUI we decided to repeat tests instead of retrying them (to increase the chance that any irregular failure would be caught before merging).
Yeah I agree there are different cases and retries might hide potential problems.
But the Gui tests aren't a good example here as they run in isolated
environment, compared to the real e2e tests.
If we want to detect flaky tests, lets setup a job that runs on cron once a day/week during off hours and post reports about the flaky tests on discord automatically. It's trivial to implement and you get the best from both worlds
Retries will make us blind to those issues. That's why in GUI we decided to repeat tests instead of retrying them (to increase the chance that any irregular failure would be caught before merging).
I hope I proved you by my recent CI changes that it was a mistake and gives nothing except anger from peers
I hope I proved you by my recent CI changes that it was a mistake and gives nothing except anger from peers
Not at all. The Project View tests are actually fairy stable. So actually, those proved long ago that I'm right. The anger comes from the fact that dashboard tests are constantly failing.
And now, package tests failed, for the first time. And I prefer to just block them, as I did, and this way unblock the peers, and fix the tests properly
But the Gui tests aren't a good example here as they run in isolated environment, compared to the real e2e tests.
I don't see any connection here. Even more, e2e should not be flaky - if they are flaky for CI, they will be flaky for users.
The Project View tests are actually fairy stable. So actually, those proved long ago that I'm right
This is not correct, if you wasn't notified about these failures, this doesn't mean they're stable. Relying on the other devs that tend to just restart a task instead of getting notified about flaky tests automatically is very naive
I don't see any connection here.
More moving parts, more instability
This is not correct, if you wasn't notified about these failures, this doesn't mean they stable. Relying on the other devs that tend to just restart a task instead of getting notified about flacky tests automatically is very naive
That's why I regularly check actions on develop and track failures there. Thence, I know that the failures are mostly in dashboard tests. I even tracked a record here: https://github.com/enso-org/enso/issues/11458
Of course, notifications in discord would help. But the problem of angry men due to tests instability is better solved by just not-running them on unrelated changes.
In case of E2E (package) tests, retries will avail nothing: seeing flaky test, everyone must make sure before that their PR does not introduce flakiness.
That's why I regularly check actions on develop and track failures there. Thence, I know that the failures are mostly in dashboard tests. I even tracked a record here: https://github.com/enso-org/enso/issues/11458
Let's enable reruns on develop then. This is exactly what I'm pointing: we shouldn't block the work of peers just to check if there are flaky tests. It's unproductive for us and gives nothing but angry. I'm proposing a different approach: instead of checking this in PRs, check on cron (e.g. over nightly builds) or, as you mentioned over develop. Create issues for flaky tests, plan the work and fix them
Btw, an example of a job that fails due to project-view
tests, that passed only on third retry: https://github.com/enso-org/enso/actions/runs/11942912488/job/33291219580
GitHubEnso Analytics is a self-service data prep and analysis platform designed for data teams. - Fix React Compiler lints + improve performance · enso-org/enso@a5f000b
Btw, an example of a job that fails due to
project-view
tests, that passed only on third retry: https://github.com/enso-org/enso/actions/runs/11942912488/job/33291219580
I checked that. This is actually a dashboard failure (look at line 1053 of the log: https://github.com/enso-org/enso/actions/runs/11942912488/job/33291219580#step:7:1053) The logs before are actually expected failures in expectations (stray console.log, I will remove it on some occasion) and they do not cause test fail - I checked that on similar failure on develop.
GitHubEnso Analytics is a self-service data prep and analysis platform designed for data teams. - Fix React Compiler lints + improve performance · enso-org/enso@a5f000b
GitHubEnso Analytics is a self-service data prep and analysis platform designed for data teams. - Fix React Compiler lints + improve performance · enso-org/enso@a5f000b
we shouldn't block the work of peers just to check if there are flaky tests. It's unproductive for us and gives nothing but angry.
So we've already fixed that, right? Because peers won't be affected by GUI tests if they didn't change GUI.
If they did change, they have to check anyway, if they don't introduce flaky tests in their PR. So it gives something.
The flaky tests shall be fixed very ASAP anyway; they always may obscure actual failures if they'll wait, more flaky tests will be merged, and even retries won't avail us - again PRs will be blocked.
Anyway I'd prefer having automated reports for flaky tests, and btw retries do mark flaky tests as well
By peers I mean any of your colleagues, not only the guys working on libs/engine stuff.
I'm still standing on the position to iterate faster and just plan a day/couple of hours to fix flaky tests during a sprint, if there are some.
By peers I mean any of your colleagues, not only the guys working on libs/engine stuff.
As I said, then I would need to check anyway if it isn't my code which introduces flakiness. Having flaky tests cumulating will also slow down iteration: if your PR is blocked, you must spend more time realizing which of, let's say, 3 failures are real and which because of inherited flakiness. Project View confirms, that if tests are fixed right away (or blocked), the slowdown of iteration is little, because the tests are just stable. But this stability is earned by repeating and not retrying tests.
Pull Request Description
Important Notes
Checklist
Please ensure that the following checklist has been satisfied before submitting the PR: