dabeaz / curio

Good Curio!
Other
4.02k stars 241 forks source link

Being able to do `async with async_thread()` #252

Closed Fuyukai closed 6 years ago

Fuyukai commented 6 years ago

Just a thought/suggestion, not sure if you wish to add this in, but it feels like it would fit perfectly within curio's design: the ability to use async_thread inline as a context manager.

For example:

await do_some_task_1()
async with async_thread():  # the body of the async with is now ran inside a thread
    r = do_some_blocking_operation_1()
    AWAIT(do_some_task_2(r))  # maybe even magic a regular await?
    r2 = do_some_blocking_operation(2)

print(r2)  # back in the curio thread

I know this is possible by yielding inside aenter or similar. This is inspired by asyncio-extra's ability to the same thing.

dabeaz commented 6 years ago

Wow. Okay, that's kind of insane. The comment on the AWAIT is the thing that has me wondering a bit--what happens if a regular await appears in there?

Might have to investigate further...

Fuyukai commented 6 years ago

It is insane but it's also really really cool. I'm pretty sure a regular await wouldn't work unless you added special DARK_ARTS_ENTER/DARK_ARTS_EXIT yieldcodes to the kernel to signify an enter/exit yield from the threadpool, since it works by yielding and running the coroutine from the thread manually.

Although, if a regular await was allowed there'd be no point having an AWAIT in anything because the same magic would translate over to normal decorator-style async threads.

dabeaz commented 6 years ago

It seems the main challenge here is coordinating the handoff to the backing thread. I'm not even sure it would involve a huge amount of dark arts. Well, maybe some dark arts, but not much more than what is already present. I'll fool around with it.

dabeaz commented 6 years ago

I've modified Curio to allow async functions to serve as the target of an async thread (good idea). So, you can get ride of the AWAIT if you use that. Will need to do further pondering for use as a context manager though.

Fuyukai commented 6 years ago

A few weeks or so back, I was considering making a PR that enabled async_threads to use proper async/await, but there was a reason I didn't, and I've just remembered: allowing await in an async_thread breaks code that depends on being in the same thread as the event loop.

Some of my code that I've written uses the assumption that the threading.local they use is for the same thread as the loop thread; doing an await from a worker thread to something that uses thread-local state will look up the wrong data (this issue also applies to PEP 567 ContextVars, to some degree). The old method of the AWAIT() function, however, does not have this issue because it submits all coroutines to the curio thread to be ran.

Not sure if this grounds to revert or if it's a design antipattern that you want people to avoid with curio libraries instead, but I thought I'd put that out there.

EDIT: In fact, this completely breaks contextvars if a function using them is from an async_thread without special handling, assuming curio was to add support for it.

dabeaz commented 6 years ago

I would think that any reliance on thread-local state would be pretty dicey with any use of async threads (even with the old AWAIT() function). How is the current version checked in different from the PR you were thinking about? Just curious.

Fuyukai commented 6 years ago

As far as I can see the usage of the AWAIT function would be fine wrt local state as it immediately schedules the coroutine to be ran on the kernel thread, whereas an await coro doesn't schedule it until a yield point where there might be multiple layers of local state reading/writing.

My hypothetical PR was fundamentally exactly the same as the current implementation, with running the coroutine manually and sending yields to/from the kernel thread.

dabeaz commented 6 years ago

Hmmm. Thinking further about this, I think it probably would be fairly critical to make all "awaitable" functions execute in the same thread as the main kernel loop. If you don't do this, you could potentially have two async functions running concurrently in separate threads and they could start interacting with parts of Curio that would now need to have thread-locking added. Ugh.

The current AWAIT() function still maintains a kind of environmental isolation--the awaitables are all executed in the main kernel thread only.

It seems like this possibility of having async operations going in separate threads would also make the context-manager case unsound as well.

Fuyukai commented 6 years ago

The context manager case is risky but you can sort of mitigate it by crashing any function that yields before the aexit is done, which signifies an await. Not sure if that would alleviate the thread-locking, mind.

If you don't do this, you could potentially have two async functions running concurrently in separate threads and they could start interacting with parts of Curio that would now need to have thread-locking added.

In cases like this I prefer sprinkling some "praise be the GIL" comments around and hope really hard.

dabeaz commented 6 years ago

I'm reverting this back to only using AWAIT. I really don't see how this can be made sane if async/await is allowed in separate threads or if a context manager can cause execution to switch into a different thread and still allow async functions to be called. I'd have to add thread-locks all over the place.

dabeaz commented 6 years ago

Note: might revisit this later. A context-manager case where no async-functions are allowed inside the block might be doable.

dabeaz commented 6 years ago

I did some more fooling around with this as a context manager. I can get something to "work", but it's still extremely dicey when await/async can be potentially used inside the context-manager block. For example, it's possible to call async functions that might not block (maybe data is ready). Such functions aren't necessarily thread-safe and you're back to some of the original problems.

All things equal, I think I'd like to keep "async threads" firmly fixed in the world of synchronous functions. You can launch a thread. You can use synchronous functions in that thread. You can have async operations safely carried out on your behalf using AWAIT(coro). However, in this environment there's no mechanism for triggering arbitrary async functions using normal await as doing so is a syntax error.

Would be really interesting to know how people view asyncio-extra's thread-pool feature. To me, it seems cool, but also potentially very dicey for the same sorts of reasons.

imrn commented 6 years ago

No part of curio is thread safe because that is the main idea. So thinking about delegating some asyncs to another thread is fundementally incompatible . You may think of running an other curio instance on the other thread but sharing any async thing (task, etc) between them is still infeasible because they all have deep connections to their parent curio instance. Even i didnt mention any interthread mess this will create.

Fuyukai commented 6 years ago

Personally, I think there's enough of a precedent to do some bytecode inspection and refuse to work if the function calling async_thread() as a context manager has any sort of bad instruction (YIELD_FROM, the async generator opcode, etc) inside the async with block, which should alleviate the issue of await inside an async with async_thread().

dabeaz commented 6 years ago

All things equal, I'd prefer not to rely on bytecode inspection for things (even the example cited is a special corner case I'd rather not have to deal with). That said, it's still a little weird (well, quite weird) that a function would simply switch threads mid-execution like that. Are the usability gains obtained by doing this worth the extra semantic complexity?

dabeaz commented 6 years ago

I'm call this some kind of "provisional feature", but I did add support for using async_thread() as a context manager. It uses some code inspection to prevent any use of async/await inside the associated code block. Play around with this. See if it makes any sense.

Fuyukai commented 6 years ago

Oh, I thought I commented earlier but my FF must've ate the comment.

Are the usability gains obtained by doing this worth the extra semantic complexity?

I mean, whilst it'd be a cool thing to do it's still possible with an closure and like, two more lines so it would really be a minor QoL change. Of course, doesn't matter now unless you plan on removing it.

That said, it's still a little weird (well, quite weird) that a function would simply switch threads mid-execution like that.

I personally think that within a with or async with block, the context can and will switch dramatically, so this isn't that out of the ordinary.

That said, I'll definitely try it in my code to replace the inline functions today.

imrn commented 6 years ago

I hope this "nice to have" feature will not turn out to be something like is_readable/is_blocked, which is against core async idea.

dabeaz commented 6 years ago

Curio has always involved a certain mix of async and threads as well as features for blending the two together (UniversalQueue, UniversalEvent, etc.). So, in that context, I don't think the feature is completely crazy. On the other hand, I still think switching threads in the middle of function is pretty unusual. I'm not totally sold on that idea. However, I'm willing to put it in and let people fool around with it for awhile. Plus, it didn't really require that much code or impact other existing parts of Curio.

Fuyukai commented 6 years ago

Uh, is there a specific reason the daemon= kwarg was removed?

dabeaz commented 6 years ago

I can't make the function work as a decorator and as a context manager if it also has that optional keyword argument included. If daemon is to come back, the context manager will have to be a different function.

Fuyukai commented 6 years ago

Oh, fair enough. The code I had using it was well overdue for a rewrite regardless.

dabeaz commented 6 years ago

Honestly, I was thinking about the naming of this whole thing though. Does it make sense to say:

async with async_thread():
       ...

Or maybe a shorter name would be better:

async with threaded():
        ...

I might be inclined to go with the latter and have the context manager be it's own thing. In that case, the daemon argument could come back.

imrn commented 6 years ago

I would suggest "async with thread():" for a thing which delegates something 'sync' to a 'thread'.

imrn commented 6 years ago

But what is the execution model for this? Will it just delegate the code within context to a thread and just carry on? Or something else?

dabeaz commented 6 years ago

The body of the with statement is executed in a separate thread. The body may only contain synchronous operations. Any use of async/await/yield inside the block is forbidden and produces an immediate RuntimeError prior to starting the thread and entering the block.

imrn commented 6 years ago

A shortcut for submitting a piece of code to a thread without packing them into a function. Right? And you do not care about its result. Sounds like bad style. But still may be useful.

Are you sure that its implementation is a piece of cake?

imrn commented 6 years ago

Does contect manager wait for the completion of the thread or just submit the code and carry on? Will it have both modes or any other?

dabeaz commented 6 years ago

I've already implemented it--it was pretty straightforward. In a nutshell, the task suspends itself. A separate thread picks it up and continues the task by making a single call to the coroutine .send() method to make it to advance through the block to the __aexit__() method. From there, it gets put back on its original thread. No async/await/yield operations are allowed inside so there is no possibility of suspension or interaction with other parts of curio while this is happening. There is no result from a context manager. This does not work by submitting the body of the context manager somewhere and waiting for it (no concurrent execution of the context manager with any other part of the function).

Honestly, it's mainly a shortcut for doing this sort of thing:

async def coro():
       def sync_code():
                do_something()
                do_something_else()
     run_in_thread(sync_code)
dabeaz commented 6 years ago

I wanted to add that as long as async/await are NOT permitted in the context-manager code block, this feature seems no-more dangerous to me than calling a synchronous function in a thread. One of the main features of Curio is that it tries pretty hard to separate async code into its own environment. Virtually every interesting feature of Curio requires the explicit use of async/await. If those are disallowed, then there's no meaningful way for the code block in the context manager to interact with Curio from a separate thread.

dabeaz commented 6 years ago

Question: Just how important is the daemon flag on the async_thread() function?

Fuyukai commented 6 years ago

The only reason I used it in my code a while ago is so that I could exit my program without waiting for the code inside an async_thread task to die (since threads aren't cancellable (easily...)). So that's somewhat important for me.

dabeaz commented 6 years ago

I've made a number of refinements to async-threads. Big picture, one thing I've been thinking about a lot in Curio is the whole interplay between threads and async. For lack of a better description, Curio implements a kind of "Duck Concurrency" where objects and functions can work in both worlds. A good example is something like a UniversalQueue that works with any combination of threads/async. This "async thread" feature is sort of in the same line of thinking. It's a real-life thread, but it's allowed to use async features. You can also write code that works in both worlds. For example, this crazy thing. Here, the echo_handler() function is being used simultaneously by async and sync code.

from curio import run, spawn
from curio.thread import AWAIT, async_thread, spawn_thread
import curio.socket as csocket
import socket
import threading

@async_thread
def echo_handler(sock, addr):
    print("Connection from", addr)
    with sock:
    while True:
        data = AWAIT(sock.recv(10000))
        if not data:
                break
        AWAIT(sock.sendall(b'GOT:'+data))
    print("Connection gone")

async def aserver(addr):
    sock = csocket.socket(csocket.AF_INET, csocket.SOCK_STREAM)
    sock.setsockopt(csocket.SOL_SOCKET, csocket.SO_REUSEADDR, True)
    sock.bind(addr)
    sock.listen(5)
    while True:
    client, addr = await sock.accept()
    await spawn(echo_handler, client, addr)

def tserver(addr):
    sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
    sock.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, True)
    sock.bind(addr)
    sock.listen(5)
    while True:
    client, addr = sock.accept()
    threading.Thread(target=echo_handler, args=(client,addr), daemon=True).start()

threading.Thread(target=tserver, args=(('',25000),), daemon=True).start()
run(aserver, ('',26000))

I mainly post this here as an interesting example. This is something I've been playing around for awhile though. Personally, I think the async-thread feature is something to be explored even more. So, I'd expect to see more developments here.

likern commented 6 years ago

Sorry, if it's not the correct place to ask this kind of question. Let's say I have a function with a lot of blocking operations like path.is_dir() and path.glob("**") and some pieces of coroutines and they are mixed. def func(): if blockfunc(): asyncfunc() blockfunc() blockfunc() asyncfunc()

And I wanted to make it async. Am I correct that the only way is to wrap every blocking function into another function with @async_func or call run_in_thread? And every time will be fired single thread for just small function path.is_dir()? (What if it NFS)

I just thought hat I can just wrap func() with @async_thread and it will work. Because running function in another thread is like async function at some level of abstraction. And it will be Ok that those async parts are no more async. Let them be blocking (if its possible). Because in context of main thread where Kernel is run they are really still async.

That is my thought how I see this.

dabeaz commented 6 years ago

Yes, that's the general idea with async threads. In order to run normal synchronous functions, they either have to be rewritten async or run in a thread. You could put run_in_thread() around all of them or you could flip the thing into an async thread. It's going to look like this:

@async_thread
def func():
     if blockfunc():
        AWAIT(asyncfunc())
        blockfunc()
        blockfunc()
        AWAIT(asyncfunc())

The overall execution is basically the same. The entire body of the @async_thread function runs in its own thread, but the functions passed to AWAIT() execute back in the main thread where they execute in the normal asynchronous environment.

This is a pretty experimental part of Curio so I'd definitely be interested to learn how it works out if you use this approach.

likern commented 6 years ago

Thank you! If I get valuable feedback I will give it back. But now I should understand how to integrate curio into GTK event-based main loop smoothly. Found some info about tk-gui tries, but still not figured out.