equinor / ert

ERT - Ensemble based Reservoir Tool - is designed for running ensembles of dynamical models such as reservoir models, in order to do sensitivity analysis and data assimilation. ERT supports data assimilation using the Ensemble Smoother (ES), Ensemble Smoother with Multiple Data Assimilation (ES-MDA) and Iterative Ensemble Smoother (IES).
https://ert.readthedocs.io/en/latest/
GNU General Public License v3.0
101 stars 104 forks source link

test_repeated_submit_same_iens for slurm integration test is flaky #8492

Closed larsevj closed 2 weeks ago

larsevj commented 4 weeks ago

Observed failing mulitple times on pr and main, see for instance https://github.com/equinor/ert/actions/runs/10416328134/job/28848356059:

_________________ test_repeated_submit_same_iens[SlurmDriver] __________________

driver = <ert.scheduler.slurm_driver.SlurmDriver object at 0x7fd278677fd0>
tmp_path = PosixPath('/tmp/pytest-of-runner/pytest-0/test_repeated_submit_same_iens8')

    @pytest.mark.flaky(reruns=5)
    async def test_repeated_submit_same_iens(driver: Driver, tmp_path):
        """Submits are allowed to be repeated for the same iens, and are to be
        handled according to FIFO, but this order cannot be guaranteed as it depends
        on the host operating system."""
        os.chdir(tmp_path)
        await driver.submit(
            0,
            "sh",
            "-c",
            f"echo submit1 > {tmp_path}/submissionrace; touch {tmp_path}/submit1",
            name="submit1",
        )
        await driver.submit(
            0,
            "sh",
            "-c",
            f"echo submit2 > {tmp_path}/submissionrace; touch {tmp_path}/submit2",
            name="submit2",
        )
        # Wait until both submissions have done their thing:
        while not Path("submit1").exists() or not Path("submit2").exists():
            await asyncio.sleep(0.1)
>       assert Path("submissionrace").read_text(encoding="utf-8") == "submit2\n"
E       AssertionError: assert 'submit1\n' == 'submit2\n'
E         
E         - submit2
E         ?       ^
E         + submit1
E         ?       ^

/home/runner/work/ert/ert/tests/integration_tests/scheduler/test_generic_driver.py:146: AssertionError
=============================== warnings summary ===============================
../../../../../opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/_hypothesis_pytestplugin.py:442
  /opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/_hypothesis_pytestplugin.py:442: PytestRemovedIn9Warning: Marks applied to fixtures have no effect
  See docs: https://docs.pytest.org/en/stable/deprecations.html#applying-a-mark-to-a-fixture-function
    return _orig_call(self, function)

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
=========================== short test summary info ============================
SKIPPED [1] tests/integration_tests/scheduler/test_generic_driver.py:194: LocalDriver has no NUM_CPU concept
SKIPPED [1] tests/integration_tests/scheduler/test_lsf_driver.py:207: Mocked LSF driver does not provide bhist
FAILED tests/integration_tests/scheduler/test_generic_driver.py::test_repeated_submit_same_iens[SlurmDriver] - AssertionError: assert 'submit1\n' == 'submit2\n'

  - submit2
  ?       ^
  + submit1
  ?       ^
=== 1 failed, 48 passed, 2 skipped, 1 warning, 5 rerun in 214.19s (0:03:34) ====
berland commented 4 weeks ago

If reruns are not enough for this, we might consider just dropping testing this property.

jonathan-eq commented 2 weeks ago

If reruns are not enough for this, we might consider just dropping testing this property.

Why are we testing this? It seems like testing OS functionally outside of our control.

berland commented 2 weeks ago

Why are we testing this? It seems like testing OS functionally outside of our control.

We are testing it because of: Submits are allowed to be repeated for the same iens, and are to be handled according to FIFO

but then the test measures this by which one is last to finish, which is not part of what Ert wants to guarantee.

jonathan-eq commented 2 weeks ago

Why are we testing this? It seems like testing OS functionally outside of our control.

We are testing it because of: Submits are allowed to be repeated for the same iens, and are to be handled according to FIFO

but then the test measures this by which one is last to finish, which is not part of what Ert wants to guarantee.

I will increase the reruns to 10, but if this doesnt work; we will remove the test