Closed xjules closed 1 month ago
Reproduced when Scheduler._publisher()
is slow - injecting some asyncio.sleep(5)
before await conn.send(event)
should make do. This naturally will prevent all the events to be sent when MAX_RUNTIME will cancel all realizations and thus the realization.STATUS will be either RUNNING
or PENDING
.
Update: it also looks like to _publisher
stopped publishing, so suggestion is to wait if this error appears again.
Another failure here: https://github.com/equinor/ert/actions/runs/10193785679/job/28199058436?pr=8362
Further updates: dispatcher_task
that runs _batch_events_into_buffer
is unable the correctly append the last batch of events due the task being cancelled after the realization is hit by max_runtime
. Reproduction of the issue is not so straightforward, but injecting await asyncio.sleep(xx)
into _batch_events_into_buffer
in the beginning should do.
Couldn't reproduce it still. It seems that the source for the problem is the Scheduler._publisher
Another failure occurred: https://github.com/equinor/ert/actions/runs/10454681280/job/28947875443?pr=8424
Let's see if this PR might help: https://github.com/equinor/ert/pull/8424
I haven't seen this error since, so closing this one.
Getting this consistently on all of my PRs now, passes on second run and only happens for python3.8.
Another failure here: https://github.com/equinor/ert/actions/runs/10719695610/job/29724320771?pr=8648 , but on 3.12
Seems like this issue still persists: https://github.com/equinor/ert/actions/runs/10738432908/job/29782008935
Copying a message from @eivindjahren, which includes suggestion to move the test into cli section.
I think this is a test concurrency problem (tests affecting each other when running at the same time). I also got a pretty persistent bug running test_tracking_missing_ecl after reorganizing the tests so that integration tests and unit tests ran at the same time. I had to fix it by making it a cli test instead:
This works consistently now, though it's far from ideal that just moving the test around fixed it (there is some underlaying timing dependency here). However, since we cannot reproduce this, we will close until it re-surfaces.
Flaky test it seems:
FAILED tests/integration_tests/status/test_tracking_integration.py::test_tracking[ee_poly_experiment_cancelled_by_max_runtime]
It appears that after realization reaches MAX_RUNTIME the cancellation won't happen and the state would be
PENDING
instead ofFAILED
.Please see for more details:
https://github.com/equinor/ert/actions/runs/10143243141/job/28044195831