CI linux testbed crash testing

beeware / toga

A Python native, OS native GUI toolkit.

https://toga.readthedocs.io/en/latest/

BSD 3-Clause "New" or "Revised" License

4.19k stars 655 forks source link

CI linux testbed crash testing #2655

Closed rmartin16 closed 2 weeks ago

rmartin16 commented 2 weeks ago

Changes

RE: https://github.com/beeware/toga/issues/2648

PR Checklist:

[ ] All new features have been tested
[ ] All new features have been documented
[x] I have read the CONTRIBUTING.md file
[x] I will abide by the code of conduct

rmartin16 commented 2 weeks ago

So, I have reasonably high confidence now that my theory in #2648 is correct.

The testbed testing strategy runs the testbed app in the main thread while the tests run in another thread. Occasionally, the tests crash while Python is garbage collecting a GTK WebView. In the deconstruction of the WebView object is an assertion that the current thread is the same thread that is running the app loop. Given that two threads are sharing this space, the garbage collector can run in either thread and trigger this assertion....crashing Python.

I've modified the testbed in this PR to show which thread is attempting to garbage collect either a MapView or WebView. Every crash locally and this one in CI shows that the crash happens when the "test thread" is running the garbage collector.

While I believe I have found the source of the error....I'm lacking any real good ideas to resolve it...I guess short of disabling the garbage collector but that seems squarely in the "bad ideas" column.

freakboy3742 commented 2 weeks ago

If I'm understanding your analysis correctly, there's one fix we could make that would be moderately complex, but would also resolve another major problem: avoid using threads in the testbed.

The current setup starts the app, uses the app to start a thread, runs the pytest suite in the thread, "injecting" tests into the main event loop. This then means some GUI activity is happening in the app thread, but some is happening in the test thread, leading to the garbage collection problem we're seeing here.

The thing is - the existence of the thread is also an issue for the web backend - because the web backend doesn't have threads. So - the major impediment to improving the web backend is that we can't run the testbed on it.

An idea that has been kicking around is to work out how to make pytest.run a coroutine. That way, we wouldn't need a separate thread - all code would be in the GUI thread.

If I'm understanding your analysis, this would also fix the GTK issue, because the GC would never be fighting cross-thread confusion.

I'm not claiming it will be easy... but does it sound plausible?

rmartin16 commented 2 weeks ago

hmm...I think if I consider how I would run pytest.main() as a coroutine, it would be quite similar to what we have here. Instead of creating a Thread manually, I'd let asyncio put the partial in to the ThreadExecutor. However, that doesn't change our situation....just the implementation.

I was wondering....did you (or have you) considered inverting this approach? That is, instead of starting the app and then starting pytest, could we start pytest and create the app in a fixture or some initial setup hook? Then we could still proxy pytest-asyncio to the app's loop and keep everything in one thread.

freakboy3742 commented 2 weeks ago

@mhsmith and I have considered it, to the extent that we've discussed how the approach would allow web testing; but thinking hasn't moved any further than that (at least, I definitely haven't prototyped anything; I can't speak with certainty for Malcolm, but I'm fairly sure he hasn't either).

However, the ThreadpoolExecutor wouldn't be an option - again, web has no threads. Each test would need to be a coroutine on the GUI thread - literally a pytest.run analog that awaits each test, awaits each setup/teardown, etc, started as a coroutine on the main event loop,

rmartin16 commented 2 weeks ago

Concerning the threadpool, I was mostly just saying that I don't really know of a way to run a synchronous function in an event loop other than to fake it with a thread (assuming its long running). Unless, of course, that function is made asynchronous...but that seems like quite the lift for the pytest API; especially considering how much it defers operation to plugins.

I may experiment a bit tomorrow with having pytest create the app...although, I'll should avoid going too far until I finish some of the other stuff I have in flight...

freakboy3742 commented 2 weeks ago

Concerning the threadpool, I was mostly just saying that I don't really know of a way to run a synchronous function in an event loop other than to fake it with a thread (assuming its long running). Unless, of course, that function is made asynchronous...but that seems like quite the lift for the pytest API; especially considering how much it defers operation to plugins.

In our case, we should be able to guarantee that all the test fixtures and tests are asynchronous, so the "red code/blue code" problem doesn't exist. That's not a general solution... but we don't need a general solution :-)

I may experiment a bit tomorrow with having pytest create the app...although, I'll should avoid going too far until I finish some of the other stuff I have in flight...

The issue is that we don't just need pytest to create the app - we need the app, and not pytest, to be responsible for starting the event loop (because it's not a standard event loop for most platforms). In the case of iOS/Android, the Python code doesn't start the event loop at all - it's started in native code, and the Python event loop just comes along for the ride. That's essentially what the await pytest.run() idea comes from - making the pytest suite run in an event loop, rather than start an event loop.

rmartin16 commented 2 weeks ago

Ok; so, having actually gone down this road a bit now, I think I see why I felt like we were talking past each other a bit.

I now understand that "starting the app in a fixture" doesn't really make any sense here...because "starting the app" means starting the loop...which means the loop becomes what the thread is doing...and you can't return control to pytest to put the tests on the event loop.

Insofar as running pytest.main() as a coroutine, I did find some prior art. The alt-pytest-asyncio package advertises actually doing this. However, it seems to come with a rather important caveat: nested loops need to be supported; interestingly, asyncio itself doesn't even support that. Nonetheless, the idea there is to run pytest.main() in its own run_until_complete() loop while the tests would use a different loop to test against. Trying this with gbulb breaks down real fast since it doesn't support recursive/nested loops.

freakboy3742 commented 2 weeks ago

Ok; so, having actually gone down this road a bit now, I think I see why I felt like we were talking past each other a bit.

I now understand that "starting the app in a fixture" doesn't really make any sense here...because "starting the app" means starting the loop...which means the loop becomes what the thread is doing...and you can't return control to pytest to put the tests on the event loop.

Insofar as running pytest.main() as a coroutine, I did find some prior art. The alt-pytest-asyncio package advertises actually doing this. However, it seems to come with a rather important caveat: nested loops need to be supported; interestingly, asyncio itself doesn't even support that. Nonetheless, the idea there is to run pytest.main() in its own run_until_complete() loop while the tests would use a different loop to test against. Trying this with gbulb breaks down real fast since it doesn't support recursive/nested loops.

Yeah - it's definitely not going to be simple.

However, it's a distinct issue from this one; Since I've merged #2658, I guess we can close this PR, and open a new ticket for the broader "rejigger the testbed" issue.