BUG: Tests on windows can be flaky

maxking commented 1 year ago

=========================== short test summary info ===========================
FAILED aiosmtpd/tests/test_main.py::TestMain::test_debug_3 - RuntimeError: Event loop stopped before Future completed.
================== 1 failed, 553 passed, 9 skipped in 52.16s ==================

https://github.com/aio-libs/aiosmtpd/actions/runs/3365228777/jobs/5580522249 https://github.com/aio-libs/aiosmtpd/actions/runs/3371138286/jobs/5593013500

maxking commented 1 year ago

I did some searches some time ago and I feel it is related to EvenLoop implementations on Windows: https://docs.python.org/3/library/asyncio-eventloop.html#event-loop-implementations

Asyncio uses a different, ProactorEventLoop by default on Windows, which could be causing some differences with Unix setups. I'll try to dig more to see if I can find something more concrete for it.

maxking commented 1 year ago

Many folks seem to work-around the issue by replacing the default ProactorEventLoop on windows with the SelectorEventLoop (default for Unix, but supports Windows too).

https://github.com/encode/httpx/issues/914#issuecomment-622586610

waynew commented 1 year ago

@maxking would that be something easy to do at the very least just in the test suite when on Windows? My guess is that it should be but it's been a while since I looked at the code :sweat_smile: If it is, that seems like a reasonable thing to do - but we should definitely document it both in the documentation (e.g. "Just a warning folks, if you're using Windows you might run into problems here!") as well as note the why in the code.

But... unless there's a super obvious answer or fix in upstream asyncio, I'd be :+1: on merging a well documented workaround.

maxking commented 1 year ago

Searching a bit more, it looks like interaction between loop.run_until_complete() and loop.stop causes this RuntimeError where they both are fundamentally in-compatible.

We do however use loop.stop in various places in tests, I can find several in test_main.py. My thought is that the delays for loop.stop in loop.call_later(DELAY, loop.stop) is long enough that the original coro just finishes, which makes this error intermittent. It could just be that slowness in the coro, caused by either resource constraints on the CI machine, could invoke loop.stop causing this RuntimeError.

In our case, we are using the fixture autostop_loop, which uses AUTOSTOP_DELAY of 1 second. It seems like with GH Actions, 1second might not be enough. This fixture is applied to the test that keep failing (https://github.com/aio-libs/aiosmtpd/blob/master/aiosmtpd/tests/test_main.py#L193).

A rather simple fix would be to just bump this to 2 (or see how many failures we see with 1.5 and then go to 2 as they will add-up with multiple tests). If that sounds reasonable, I can open a PR with whatever updated delay you prefer.

waynew commented 1 year ago

That makes sense - and I prefer that to changing out the event loop, if it solves our issue 👍

maxking commented 1 year ago

Awesome, I created #315 to bump it to 1.5s for now.

pepoluan commented 1 year ago

Since the PR is merged, let's close this :-)

aio-libs / aiosmtpd

BUG: Tests on windows can be flaky #313