Intermittent CI failure on linux-wayland testbed

freakboy3742 commented 1 month ago

Describe the bug

We've started seeing an intermittent failure in the Linux Wayland Testbed suite.

For example, this PR took 6 attempts before it passed; other PRs pass first time; others on a second attempt.

Steps to reproduce

Run the linux-wayland testbed test on any PR.

There's no obvious pattern to the failures.

Expected behavior

Linux Wayland testbed should pass reliably.

Screenshots

No response

Environment

Operating System: Linux + Wayland
Python version: 3.10
Software versions:
- Briefcase: 0.3.19
- Toga: 8bec404dd732d478b3ca50b1ccfefcebf5c0e650 or later, 22 Sep 2024

Logs

__________________________________ test_focus __________________________________
Traceback (most recent call last):
  File "/home/runner/work/toga/toga/testbed/build/testbed/ubuntu/jammy/testbed-0.0.1/usr/lib/testbed/app_packages/_pytest/runner.py", line 341, in from_call
    result: TResult | None = func()
  File "/home/runner/work/toga/toga/testbed/build/testbed/ubuntu/jammy/testbed-0.0.1/usr/lib/testbed/app_packages/_pytest/runner.py", line 242, in <lambda>
    lambda: runtest_hook(item=item, **kwds), when=when, reraise=reraise
  File "/home/runner/work/toga/toga/testbed/build/testbed/ubuntu/jammy/testbed-0.0.1/usr/lib/testbed/app_packages/pluggy/_hooks.py", line 513, in __call__
    return self._hookexec(self.name, self._hookimpls.copy(), kwargs, firstresult)
  File "/home/runner/work/toga/toga/testbed/build/testbed/ubuntu/jammy/testbed-0.0.1/usr/lib/testbed/app_packages/pluggy/_manager.py", line 120, in _hookexec
    return self._inner_hookexec(hook_name, methods, kwargs, firstresult)
  File "/home/runner/work/toga/toga/testbed/build/testbed/ubuntu/jammy/testbed-0.0.1/usr/lib/testbed/app_packages/pluggy/_callers.py", line 139, in _multicall
    raise exception.with_traceback(exception.__traceback__)
  File "/home/runner/work/toga/toga/testbed/build/testbed/ubuntu/jammy/testbed-0.0.1/usr/lib/testbed/app_packages/pluggy/_callers.py", line 122, in _multicall
    teardown.throw(exception)  # type: ignore[union-attr]
  File "/home/runner/work/toga/toga/testbed/build/testbed/ubuntu/jammy/testbed-0.0.1/usr/lib/testbed/app_packages/_pytest/threadexception.py", line 92, in pytest_runtest_call
    yield from thread_exception_runtest_hook()
  File "/home/runner/work/toga/toga/testbed/build/testbed/ubuntu/jammy/testbed-0.0.1/usr/lib/testbed/app_packages/_pytest/threadexception.py", line 68, in thread_exception_runtest_hook
    yield
  File "/home/runner/work/toga/toga/testbed/build/testbed/ubuntu/jammy/testbed-0.0.1/usr/lib/testbed/app_packages/pluggy/_callers.py", line 122, in _multicall
    teardown.throw(exception)  # type: ignore[union-attr]
  File "/home/runner/work/toga/toga/testbed/build/testbed/ubuntu/jammy/testbed-0.0.1/usr/lib/testbed/app_packages/_pytest/unraisableexception.py", line 95, in pytest_runtest_call
    yield from unraisable_exception_runtest_hook()
  File "/home/runner/work/toga/toga/testbed/build/testbed/ubuntu/jammy/testbed-0.0.1/usr/lib/testbed/app_packages/_pytest/unraisableexception.py", line 70, in unraisable_exception_runtest_hook
    yield
  File "/home/runner/work/toga/toga/testbed/build/testbed/ubuntu/jammy/testbed-0.0.1/usr/lib/testbed/app_packages/pluggy/_callers.py", line 122, in _multicall
    teardown.throw(exception)  # type: ignore[union-attr]
  File "/home/runner/work/toga/toga/testbed/build/testbed/ubuntu/jammy/testbed-0.0.1/usr/lib/testbed/app_packages/_pytest/logging.py", line 846, in pytest_runtest_call
    yield from self._runtest_for(item, "call")
  File "/home/runner/work/toga/toga/testbed/build/testbed/ubuntu/jammy/testbed-0.0.1/usr/lib/testbed/app_packages/_pytest/logging.py", line 829, in _runtest_for
    yield
  File "/home/runner/work/toga/toga/testbed/build/testbed/ubuntu/jammy/testbed-0.0.1/usr/lib/testbed/app_packages/pluggy/_callers.py", line 122, in _multicall
    teardown.throw(exception)  # type: ignore[union-attr]
  File "/home/runner/work/toga/toga/testbed/build/testbed/ubuntu/jammy/testbed-0.0.1/usr/lib/testbed/app_packages/_pytest/capture.py", line 880, in pytest_runtest_call
    return (yield)
  File "/home/runner/work/toga/toga/testbed/build/testbed/ubuntu/jammy/testbed-0.0.1/usr/lib/testbed/app_packages/pluggy/_callers.py", line 122, in _multicall
    teardown.throw(exception)  # type: ignore[union-attr]
  File "/home/runner/work/toga/toga/testbed/build/testbed/ubuntu/jammy/testbed-0.0.1/usr/lib/testbed/app_packages/_pytest/skipping.py", line 257, in pytest_runtest_call
    return (yield)
  File "/home/runner/work/toga/toga/testbed/build/testbed/ubuntu/jammy/testbed-0.0.1/usr/lib/testbed/app_packages/pluggy/_callers.py", line 103, in _multicall
    res = hook_impl.function(*args)
  File "/home/runner/work/toga/toga/testbed/build/testbed/ubuntu/jammy/testbed-0.0.1/usr/lib/testbed/app_packages/_pytest/runner.py", line 174, in pytest_runtest_call
    item.runtest()
  File "/home/runner/work/toga/toga/testbed/build/testbed/ubuntu/jammy/testbed-0.0.1/usr/lib/testbed/app_packages/pytest_asyncio/plugin.py", line 457, in runtest
    super().runtest()
  File "/home/runner/work/toga/toga/testbed/build/testbed/ubuntu/jammy/testbed-0.0.1/usr/lib/testbed/app_packages/_pytest/python.py", line 1627, in runtest
    self.ihook.pytest_pyfunc_call(pyfuncitem=self)
  File "/home/runner/work/toga/toga/testbed/build/testbed/ubuntu/jammy/testbed-0.0.1/usr/lib/testbed/app_packages/pluggy/_hooks.py", line 513, in __call__
    return self._hookexec(self.name, self._hookimpls.copy(), kwargs, firstresult)
  File "/home/runner/work/toga/toga/testbed/build/testbed/ubuntu/jammy/testbed-0.0.1/usr/lib/testbed/app_packages/pluggy/_manager.py", line 120, in _hookexec
    return self._inner_hookexec(hook_name, methods, kwargs, firstresult)
  File "/home/runner/work/toga/toga/testbed/build/testbed/ubuntu/jammy/testbed-0.0.1/usr/lib/testbed/app_packages/pluggy/_callers.py", line 182, in _multicall
    return outcome.get_result()
  File "/home/runner/work/toga/toga/testbed/build/testbed/ubuntu/jammy/testbed-0.0.1/usr/lib/testbed/app_packages/pluggy/_result.py", line 100, in get_result
    raise exc.with_traceback(exc.__traceback__)
  File "/home/runner/work/toga/toga/testbed/build/testbed/ubuntu/jammy/testbed-0.0.1/usr/lib/testbed/app_packages/pluggy/_callers.py", line 103, in _multicall
    res = hook_impl.function(*args)
  File "/home/runner/work/toga/toga/testbed/build/testbed/ubuntu/jammy/testbed-0.0.1/usr/lib/testbed/app_packages/_pytest/python.py", line 159, in pytest_pyfunc_call
    result = testfunction(**testargs)
  File "/home/runner/work/toga/toga/testbed/build/testbed/ubuntu/jammy/testbed-0.0.1/usr/lib/testbed/app_packages/pytest_asyncio/plugin.py", line 929, in inner
    _loop.run_until_complete(task)
  File "/home/runner/work/toga/toga/testbed/build/testbed/ubuntu/jammy/testbed-0.0.1/usr/lib/testbed/app/tests/conftest.py", line 142, in run_until_complete
    return asyncio.run_coroutine_threadsafe(coro, self.loop).result()
  File "/usr/lib/python3.10/concurrent/futures/_base.py", line 458, in result
    return self.__get_result()
  File "/usr/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
    raise self._exception
  File "/home/runner/work/toga/toga/testbed/build/testbed/ubuntu/jammy/testbed-0.0.1/usr/lib/testbed/app/tests/widgets/properties.py", line 69, in test_focus
    assert other_probe.has_focus
AssertionError: assert False
 +  where False = <tests_backend.widgets.textinput.TextInputProbe object at 0x7ff3982b25c0>.has_focus
=========================== short test summary info ============================
FAILED tests/widgets/test_slider.py::test_focus - assert False
 +  where False = <tests_backend.widgets.textinput.TextInputProbe object at 0x7ff398716710>.has_focus
FAILED tests/widgets/test_splitcontainer.py::test_focus_noop - assert False
 +  where False = <tests_backend.widgets.textinput.TextInputProbe object at 0x7ff398552ad0>.has_focus
FAILED tests/widgets/test_switch.py::test_focus_noop - assert False
 +  where False = <tests_backend.widgets.textinput.TextInputProbe object at 0x7ff398553f40>.has_focus
FAILED tests/widgets/test_table.py::test_focus_noop - assert False
 +  where False = <tests_backend.widgets.textinput.TextInputProbe object at 0x7ff3985cfee0>.has_focus
FAILED tests/widgets/test_textinput.py::test_focus - assert False
 +  where False = <tests_backend.widgets.textinput.TextInputProbe object at 0x7ff3984f9ff0>.has_focus
FAILED tests/widgets/test_tree.py::test_focus_noop - assert False
 +  where False = <tests_backend.widgets.textinput.TextInputProbe object at 0x7ff3983d3130>.has_focus
FAILED tests/widgets/test_webview.py::test_focus - assert False
 +  where False = <tests_backend.widgets.textinput.TextInputProbe object at 0x7ff3982b25c0>.has_focus
======= 7 failed, 454 passed, 69 skipped, 7 xfailed in 108.00s (0:01:47) =======

Additional context

No response

rmartin16 commented 1 month ago

Experimentally, the issue is none of the widgets have focus...as opposed to the "wrong" widget having focus. I'm not sure, though, what the underlying mechanism causing this could be. It seems to be something systemic given that the tests for focus always fail together...

rmartin16 commented 1 month ago

Testing in #2873 confirms that what I'm seeing locally is also happening in CI. It seems like something is disabling focus altogether....or preventing grab_focus() from working...

In some additional experimentation, I added an asyncio.sleep() when a focus test failed. This allowed me to manually interact with the app to assign focus to the TextBox; once I did this, the remaining focus tests would pass.

rmartin16 commented 1 month ago

Thinking about this a bit more....I tried a few other things.

Testing in #2891 reinforces my belief that something is "breaking" GTK such that grab_focus() stops working.

If you look at the focus tests that fail, they're always the same: it's the focus tests for Widgets whose names come alphabetically after the letters sl. This is potentially important because pytest runs tests alphabetically (at least here). If you consider the inverse set of Widgets, such as Activity Indicator, Button, etc., their focus tests never just randomly fail. If we follow this logic, the bisection leads to the Selection Widget; once the tests for Selection run, the ability to focus a widget ostensibly stops working.

Therefore, something about the tests for Selection seem to be causing this behavior. And finally, when I remove test_selection.py:test_selection_change from the test suite, I can no longer recreate the issue locally. Drilling down just a little bit further....if I allow more and more of test_selection_change to run, the issue reappears once probe.select_item() runs.

freakboy3742 commented 1 month ago

If you look at the focus tests that fail, they're always the same: it's the focus tests for Widgets whose names come alphabetically after the letters sl.

I had the same thought - it's not all focus tests, just the ones after selection. I went looking for a culprit, but not being able to reproduce made it a difficult wumpus to hunt.

Therefore, something about the tests for Selection seem to be causing this behavior. And finally, when I remove test_selection.py:test_selection_change from the test suite, I can no longer recreate the issue locally.

That's very interesting...

Drilling down just a little bit further....if I allow more and more of test_selection_change to run, the issue reappears once probe.select_item() runs.

Here's a theory - select_item() is calling native.popup(), which will be, at a technical level, creating a new window. Is it possible that something has changed in Wayland (or GTK...) which results in the actual app window losing focus to this new window, but then not giving focus back to the app window when the popup disappears, resulting in widgets in the app window being unable to gain focus at all? Do we need to either add a self.widget.window.native.present() (or similar) to ensure that the window is the focus widget after the popup?

rmartin16 commented 1 month ago

Tried a few more things and ultimately, no matter what I try to get focus back on the testbed app, it doesn't work.

So, then I started experimenting directly in Fedora 40 which has a native Wayland environment...and the testing runs fine. Then I tried running the tests using the mutter provided by Fedora 40 and the tests also run fine. Fedora 40 and Ubuntu 24.04 both provide mutter 46.2 while Ubuntu 22.04 is back on mutter 42.9.

I think if we just get on to a newer version of mutter, these tests will stop failing.

beeware / toga