Adjust retries for IDE tests

MrFlashAccount commented 1 week ago

Pull Request Description

Important Notes

Checklist

Please ensure that the following checklist has been satisfied before submitting the PR:

[ ] The documentation has been updated, if necessary.
[ ] Screenshots/screencasts have been attached, if there are any visual changes. For interactive or animated visual changes, a screencast is preferred.
[ ] All code follows the Scala, Java, TypeScript, and Rust style guides. In case you are using a language not listed above, follow the Rust style guide.
[ ] Unit tests have been written where possible.
[ ] If meaningful changes were made to logic or tests affecting Enso Cloud integration in the libraries, or the Snowflake database integration, a run of the Extra Tests has been scheduled.
- If applicable, it is suggested to paste a link to a successful run of the Extra Tests.

farmaazon commented 1 week ago

We should rather prefer https://github.com/enso-org/enso/pull/11602. Retries are not perfect, as they still make builds longer + 5 may be still too few (Jaroslav reported getting error 8 times in a row).

MrFlashAccount commented 1 week ago

We should rather prefer https://github.com/enso-org/enso/pull/11602. Retries are not perfect, as they still make builds longer + 5 may be still too few (Jaroslav reported getting error 8 times in a row).

Yeah, it's a draft and more like a playground to test ideas. Though retries are abslolutely necessary

farmaazon commented 1 week ago

Though retries are abslolutely necessary

Why? I think fixing the tests is necessary, not retries :)

MrFlashAccount commented 1 week ago

Retries is a common practice for tests on CI as there are always unavoidable fluctuation due to concurrent access to resources and many other instabilities. Adjusting retries help to reduce the amount of cases when tests fail due to false-positive scenarios.

We do retries though, as you mentioned, even up to 8 times, manually. Why wouldn't we do that automatically?

farmaazon commented 1 week ago

Retries is a common practice for tests on CI as there are always unavoidable fluctuation due to concurrent access to resources and many other instabilities. Adjusting retries help to reduce the amount of cases when tests fail due to false-positive scenarios.

The same fluctuations may happen on user's machines, and they will blame us for that.

We do retries though, as you mentioned, even up to 8 times, manually. Why wouldn't we do that automatically?

We should not do. If tests require 8 retries, then they should be blocked and fixed.

In GUI playwright tests, we had several cases of flaky tests. After a closer look it, it was never due to CI instability, but

The tests were written badly, usually not waiting for some action completion
Our application code had a race condition - quite a common case, actually!

Retries will make us blind to those issues. That's why in GUI we decided to repeat tests instead of retrying them (to increase the chance that any irregular failure would be caught before merging).

MrFlashAccount commented 1 week ago

Yeah I agree there are different cases and retries might hide potential problems.

But the Gui tests aren't a good example here as they run in isolated environment, compared to the real e2e tests.

If we want to detect flaky tests, lets setup a job that runs on cron once a day/week during off hours and post reports about the flaky tests on discord automatically. It's trivial to implement and you get the best from both worlds

MrFlashAccount commented 1 week ago

Retries will make us blind to those issues. That's why in GUI we decided to repeat tests instead of retrying them (to increase the chance that any irregular failure would be caught before merging).

I hope I proved you by my recent CI changes that it was a mistake and gives nothing except anger from peers

farmaazon commented 1 week ago

I hope I proved you by my recent CI changes that it was a mistake and gives nothing except anger from peers

Not at all. The Project View tests are actually fairy stable. So actually, those proved long ago that I'm right. The anger comes from the fact that dashboard tests are constantly failing.

And now, package tests failed, for the first time. And I prefer to just block them, as I did, and this way unblock the peers, and fix the tests properly

farmaazon commented 1 week ago

But the Gui tests aren't a good example here as they run in isolated environment, compared to the real e2e tests.

I don't see any connection here. Even more, e2e should not be flaky - if they are flaky for CI, they will be flaky for users.

MrFlashAccount commented 1 week ago

The Project View tests are actually fairy stable. So actually, those proved long ago that I'm right

This is not correct, if you wasn't notified about these failures, this doesn't mean they're stable. Relying on the other devs that tend to just restart a task instead of getting notified about flaky tests automatically is very naive

MrFlashAccount commented 1 week ago

I don't see any connection here.

More moving parts, more instability

farmaazon commented 1 week ago

This is not correct, if you wasn't notified about these failures, this doesn't mean they stable. Relying on the other devs that tend to just restart a task instead of getting notified about flacky tests automatically is very naive

That's why I regularly check actions on develop and track failures there. Thence, I know that the failures are mostly in dashboard tests. I even tracked a record here: https://github.com/enso-org/enso/issues/11458

Of course, notifications in discord would help. But the problem of angry men due to tests instability is better solved by just not-running them on unrelated changes.

In case of E2E (package) tests, retries will avail nothing: seeing flaky test, everyone must make sure before that their PR does not introduce flakiness.

MrFlashAccount commented 1 week ago

That's why I regularly check actions on develop and track failures there. Thence, I know that the failures are mostly in dashboard tests. I even tracked a record here: https://github.com/enso-org/enso/issues/11458

Let's enable reruns on develop then. This is exactly what I'm pointing: we shouldn't block the work of peers just to check if there are flaky tests. It's unproductive for us and gives nothing but angry. I'm proposing a different approach: instead of checking this in PRs, check on cron (e.g. over nightly builds) or, as you mentioned over develop. Create issues for flaky tests, plan the work and fix them

MrFlashAccount commented 1 week ago

Btw, an example of a job that fails due to project-view tests, that passed only on third retry: https://github.com/enso-org/enso/actions/runs/11942912488/job/33291219580

GitHub
Fix React Compiler lints + improve performance · enso-org/enso@a5f000b
Enso Analytics is a self-service data prep and analysis platform designed for data teams. - Fix React Compiler lints + improve performance · enso-org/enso@a5f000b

farmaazon commented 1 week ago

Btw, an example of a job that fails due to project-view tests, that passed only on third retry: https://github.com/enso-org/enso/actions/runs/11942912488/job/33291219580

I checked that. This is actually a dashboard failure (look at line 1053 of the log: https://github.com/enso-org/enso/actions/runs/11942912488/job/33291219580#step:7:1053) The logs before are actually expected failures in expectations (stray console.log, I will remove it on some occasion) and they do not cause test fail - I checked that on similar failure on develop.

GitHub
Fix React Compiler lints + improve performance · enso-org/enso@a5f000b
Enso Analytics is a self-service data prep and analysis platform designed for data teams. - Fix React Compiler lints + improve performance · enso-org/enso@a5f000b

farmaazon commented 1 week ago

we shouldn't block the work of peers just to check if there are flaky tests. It's unproductive for us and gives nothing but angry.

So we've already fixed that, right? Because peers won't be affected by GUI tests if they didn't change GUI.

If they did change, they have to check anyway, if they don't introduce flaky tests in their PR. So it gives something.

The flaky tests shall be fixed very ASAP anyway; they always may obscure actual failures if they'll wait, more flaky tests will be merged, and even retries won't avail us - again PRs will be blocked.

MrFlashAccount commented 1 week ago

Anyway I'd prefer having automated reports for flaky tests, and btw retries do mark flaky tests as well

By peers I mean any of your colleagues, not only the guys working on libs/engine stuff.

I'm still standing on the position to iterate faster and just plan a day/couple of hours to fix flaky tests during a sprint, if there are some.

farmaazon commented 1 week ago

By peers I mean any of your colleagues, not only the guys working on libs/engine stuff.

As I said, then I would need to check anyway if it isn't my code which introduces flakiness. Having flaky tests cumulating will also slow down iteration: if your PR is blocked, you must spend more time realizing which of, let's say, 3 failures are real and which because of inherited flakiness. Project View confirms, that if tests are fixed right away (or blocked), the slowdown of iteration is little, because the tests are just stable. But this stability is earned by repeating and not retrying tests.

enso-org / enso