canonical / operator

Pure Python framework for writing Juju charms
Apache License 2.0
245 stars 119 forks source link

TestExec.test_no_wait_call may be flaky #1360

Open dimaqq opened 1 week ago

dimaqq commented 1 week ago

An example failure in the doc-only change:

https://github.com/canonical/operator/actions/runs/10713359939/job/29705252719

=================================== FAILURES ===================================
__________________________ TestExec.test_no_wait_call __________________________
[gw0] linux -- Python 3.10.14 /home/runner/work/operator/operator/.tox/unit/bin/python
Traceback (most recent call last):
  File "/home/runner/work/operator/operator/.tox/unit/lib/python3.10/site-packages/_pytest/runner.py", line 341, in from_call
    result: Optional[TResult] = func()
  File "/home/runner/work/operator/operator/.tox/unit/lib/python3.10/site-packages/_pytest/runner.py", line 262, in <lambda>
    lambda: ihook(item=item, **kwds), when=when, reraise=reraise
  File "/home/runner/work/operator/operator/.tox/unit/lib/python3.10/site-packages/pluggy/_hooks.py", line 513, in __call__
    return self._hookexec(self.name, self._hookimpls.copy(), kwargs, firstresult)
  File "/home/runner/work/operator/operator/.tox/unit/lib/python3.10/site-packages/pluggy/_manager.py", line 120, in _hookexec
    return self._inner_hookexec(hook_name, methods, kwargs, firstresult)
  File "/home/runner/work/operator/operator/.tox/unit/lib/python3.10/site-packages/pluggy/_callers.py", line 182, in _multicall
    return outcome.get_result()
  File "/home/runner/work/operator/operator/.tox/unit/lib/python3.10/site-packages/pluggy/_result.py", line 100, in get_result
    raise exc.with_traceback(exc.__traceback__)
  File "/home/runner/work/operator/operator/.tox/unit/lib/python3.10/site-packages/pluggy/_callers.py", line 103, in _multicall
    res = hook_impl.function(*args)
  File "/home/runner/work/operator/operator/.tox/unit/lib/python3.10/site-packages/_pytest/runner.py", line 177, in pytest_runtest_call
    raise e
  File "/home/runner/work/operator/operator/.tox/unit/lib/python3.10/site-packages/_pytest/runner.py", line 169, in pytest_runtest_call
    item.runtest()
  File "/home/runner/work/operator/operator/.tox/unit/lib/python3.10/site-packages/_pytest/python.py", line 1792, in runtest
    self.ihook.pytest_pyfunc_call(pyfuncitem=self)
  File "/home/runner/work/operator/operator/.tox/unit/lib/python3.10/site-packages/pluggy/_hooks.py", line 513, in __call__
    return self._hookexec(self.name, self._hookimpls.copy(), kwargs, firstresult)
  File "/home/runner/work/operator/operator/.tox/unit/lib/python3.10/site-packages/pluggy/_manager.py", line 120, in _hookexec
    return self._inner_hookexec(hook_name, methods, kwargs, firstresult)
  File "/home/runner/work/operator/operator/.tox/unit/lib/python3.10/site-packages/pluggy/_callers.py", line 139, in _multicall
    raise exception.with_traceback(exception.__traceback__)
  File "/home/runner/work/operator/operator/.tox/unit/lib/python3.10/site-packages/pluggy/_callers.py", line 103, in _multicall
    res = hook_impl.function(*args)
  File "/home/runner/work/operator/operator/.tox/unit/lib/python3.10/site-packages/_pytest/python.py", line 194, in pytest_pyfunc_call
    result = testfunction(**testargs)
  File "/home/runner/work/operator/operator/test/test_pebble.py", line 3314, in test_no_wait_call
    assert (
AssertionError: assert "Implicitly c...ss-3eg16rew'>" == 'ExecProcess ...wait_output()'
  - ExecProcess instance garbage collected without call to wait() or wait_output()
  + Implicitly cleaning up <TemporaryDirectory '/tmp/ops-harness-3eg16rew'>
=============================== warnings summary ===============================
test/test_main.py::TestCharmInit::test_storage_with_storage
  /home/runner/work/operator/operator/ops/_main.py:456: DeprecationWarning: Controller storage is deprecated; it's intended for podspec charms and will be removed in a future release.
    warnings.warn(

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
=========================== short test summary info ============================
FAILED test/test_pebble.py::TestExec::test_no_wait_call - assert "Implicitly c...ss-3eg16rew'>" == 'ExecProcess ...wait_output()'
============ 1 failed, 1171 passed, 35 skipped, 1 warning in 44.94s ============
unit: exit 1 (45.59 seconds) /home/runner/work/operator/operator> pytest -n auto --ignore=test/smoke -v --tb native pid=1856
  unit: FAIL code 1 (47.94=setup[2.34]+cmd[45.59] seconds)
  evaluation failed :( (49.04 seconds)
Error: Process completed with exit code 1.
dimaqq commented 1 week ago

Side note, same test reliably fails in a different way when run under PyPy, because the garbage collection is delayed and weak reference not released on time.

I think that root causes are ultimately different, but there's a commonality of relying on order of destruction vs. observing a side effect.