ARM-software / devlib

Library for interaction with and instrumentation of remote devices.
Apache License 2.0
47 stars 78 forks source link

Replace nest_asyncio with greenlets #683

Closed douglas-raillard-arm closed 3 weeks ago

douglas-raillard-arm commented 5 months ago

Considering https://github.com/ARM-software/devlib/issues/682, this PR updates devlib to use greenlets in cases where nest_asyncio cannot be used (e.g. using uvloop event loop).

This alternative path works by using greenlet to provide a way for nested coroutines separated from the top-level coroutine by blocking calls to yield their action.

TODO:

Fixes #682

douglas-raillard-arm commented 5 months ago

PR updated with:

  1. implementation directly based on greenlet rather than greenback. This avoids some ctypes-based hacks that are not necessary. That implementation is compatible with any asyncio event loop implementation.
  2. Still use nest_asyncio when possible, as the greenlet version might execute the coroutine on a separate thread in some circumstances, whereas nest_asyncio never needs to. This important as the jupyter lab use case unfortunately falls in the "needing the thread" category, since the all code (including the imports) executes inside a pre-existing coroutine.
  3. A commit that allows recycling connection instances when a thread is done with them, rather than just destroying the connection. This means that a client frequently spawning threads (e.g. creating short lived thread pools) will not reconnect to the target all the time. The coroutine execution in a separate thread can also share the connection. This is implemented by returning the connection to the pool after every Target method that directly uses self.conn (see call_conn decorator). This has a user-visible effect that target.conn might give a different object across calls to Target methods.
douglas-raillard-arm commented 4 months ago

PR updated with exception handling in coroutine and exception injection with .throw(). The state of this PR will stay as [Draft] until I add some unit tests to ensure correct operations (and fix all the items in the TODO list in this PR cover letter) as discussed with Vincent Coubard.

douglas-raillard-arm commented 4 months ago

PR updated with:

douglas-raillard-arm commented 4 months ago

PR updated with extra comments. I now consider it ready so I'll remove the [Draft] status

douglas-raillard-arm commented 4 months ago

Found an issue, please do not merge as-is.

EDIT: the issue is the following: devlib.utils.asyn.asynccontextmanager() provides a wrapper over contextlib.asynccontextmanager() to allow treating the async context manager as a regular blocking context manager. This is achieved by implementing the following:


    def __enter__(self, *args, **kwargs):
        return run(self.cm.__aenter__(*args, **kwargs))

    def __exit__(self, *args, **kwargs):
        return run(self.cm.__aexit__(*args, **kwargs))

This is fine as long as run() simply re-enters an existing event loop. If these run() calls are top-level calls, they will each create a new event loop and try to iterate over the async generator. This is a problem in two ways:

  1. An event loop closes all async generators upon exit. This means the 2nd run() call from __exit__() is ends up seeing a closed async gen, confusing the stdlib implem that then raises RuntimeError("generator didn't stop after athrow()")
  2. We end up "migrating" the async gen from one event loop to another, which is the same as migrating a coroutine across event loops.

Issue 1. can be worked around by hijacking the mechanism the event loop uses to be aware of new async gen.

However, issue 2. is more tricky and is probably a real issue if e.g. the async generator tries to take an asyncio.Lock() across yield calls. The lock future would be handled by the first event loop, then cancelled, and the generator would probably fail to run the 2nd iteration on another event loop.

I'll experiment to see what possibilities exist to fix this problem. This is fortunately the only place that relies on migrating a coroutine between multiple event loops. Trio encountered a similar problem: https://github.com/python-trio/trio/issues/2081

douglas-raillard-arm commented 4 months ago

PR updated with:

douglas-raillard-arm commented 4 months ago

@marcbonnici, Vincent Coubard, Branislav Rankov, this should be ready for the last review/testing round. I consider it to be ready.

douglas-raillard-arm commented 3 months ago

PR rebased. As of today we started dogfooding that PR in LISA's vendored devlib tree, so it should get some more real-world exposure in the coming weeks.

douglas-raillard-arm commented 3 months ago

PR updated with extra tests to check the blocking API works when invoked from asyncio.to_thread(). Everything does work.

douglas-raillard-arm commented 3 months ago

PR updated with a fix to use run() on the return values of anext() for async generators, with the matching unit test.

douglas-raillard-arm commented 3 months ago

PR rebased

douglas-raillard-arm commented 2 months ago

Update with a check to only call loop.shutdown_default_executor() if it exists, since it was added in Python 3.9

https://docs.python.org/3/library/asyncio-eventloop.html#asyncio.loop.shutdown_default_executor

douglas-raillard-arm commented 2 months ago

An issue was found, I'm currently working on fixing it, please do not merge until it's fixed.

douglas-raillard-arm commented 2 months ago

PR updated with:

douglas-raillard-arm commented 2 months ago

One thing I realized and might want to change is that if an event loop is already running (e.g. in a jupyterlab notebook), we will dispatch the coroutine on a loop setup in a separate thread. It's all good except we have a single such thread.

This means that code making parallel devlib invocation of devlib in threads with a pre-setup event loop will end up being serialized. It shouldn't be very hard to change, I'll see if I can do that tomorrow

douglas-raillard-arm commented 2 months ago

PR updated with:

Considered it is a substantial change, I'd be more comfortable dogfooding it in LISA for a little while before we consider merging it.

douglas-raillard-arm commented 3 weeks ago

1.5 month later there has been no reported issue, so I think it's good to go