Consider merging the dispatch.coroutine and dispatch.function decorators

dispatchrun / dispatch-py

Python package to develop applications with Dispatch.

https://pypi.org/project/dispatch-py/

Apache License 2.0

56 stars 3 forks source link

Consider merging the dispatch.coroutine and dispatch.function decorators #63

Closed achille-roussel closed 8 months ago

achille-roussel commented 9 months ago

By splitting the simple stateful functions and durable coroutines, we could create complexity for developers that could be avoided if we simplified everything under one decorator.

Why not make all stateful functions durable coroutines?

One potential problem we might run into is the interoperability with I/O event loops like asyncio, trio, etc... These also exploit Python's async/await facilities to implement I/O concurrency.

A program that uses asyncio will likely expect to be able to use async/await for volatile I/O operations, if we take over the execution of Python coroutines, we either make it incompatible to use dispatch with asyncio, or we need to bridge with the asyncio event loop.

When dispatch.coroutine and dispatch.function are split, we can allow dispatch.function to be compatible with the asyncio facility and use dispatch.coroutine for blocking calls since the Dispatch scheduler provides concurrency on async calls.

Should we merge the dispatch.function and dispatch.coroutine decorators?

Conceptually, this is the right north star to aspire to. The pending question is whether we can do it without introducing confusion for the users and without breaking applications that would try to use the Dispatch SDK with other async code.

Please comment!

pelletier commented 8 months ago

I think merging the two is the right goal, and that you are right that tying ourselves to asyncio comes with issues in real world applications that already use the global event loop. My intuition is that we are better off figuring out a different way of expressing the durable actions – without asyncio – even at the cost of more complexity in the SDK and/or more mucking around the CPython internal structures.

achille-roussel commented 8 months ago

My intuition is that we are better off figuring out a different way of expressing the durable actions – without asyncio – even at the cost of more complexity in the SDK and/or more mucking around the CPython internal structures.

Could you develop more on that point?

pelletier commented 8 months ago

As one of our goals is to integrate with existing development flows and codebases, we can't prevent people from using asyncio simultaneously with dispatch. Since we can't reliably control the asyncio event loop in the interpreter (or whether there is one), we can't assume that we control the behavior or async/await through the program. So I think the best bet for us is to provide all the primitives to write durable coroutines without using async/await, even though it may increase the complexity we take on. That way we remove one big differentiator between durable coroutines and functions, so that we can merge both, hopefully by making all stateful functions durable coroutines.

chriso commented 8 months ago

Functions and coroutines are both converted into "primitive" functions that accept a Dispatch RunRequest (aka. dispatch.proto.Input) and return a RunResponse (aka. dispatch.proto.Output). Each has a different implementation of the wrapper, e.g. compare @dispatch.function and @dispatch.coroutine.

At the moment we reject async functions in @dispatch.function and require async functions in @dispatch.coroutine, so it would be possible to merge the two and use whether the function is async to choose an implementation of the wrapper. If the user explicitly wants one or another (e.g. they want to use an async function with the @dispatch.function wrapper, so that we don't engage the local coroutine scheduler and don't interfere with an existing event loop) then we can add a parameter like @dispatch.function(durable=False).

Users will likely run into issues attempting to integrate an existing event loop (e.g. asyncio), and this will happen with or without having a separate @dispatch.coroutine method. For now, let's simplify by eliminating @dispatch.coroutine and worry about this issue separately?

achille-roussel commented 8 months ago

An important distinction that @chriso pointed out is that asyncio is just a library designed on top of Python coroutines. Even though confusion can be made sometimes, there are other async I/O libraries in Python, like trio and friends, which aren't compatible with the default asyncio unless extra adapters are added.

This looks like precedents that have parallels with our case; async/await leaks the coloring problem of the event loop underneath, regardless of the use case.

Using native coroutines as the backbone for durable coroutines does not require asyncio; it would only be needed to coexist with async I/O libraries, which might be asyncio or other libraries. For example, even if we use Python coroutines to model durable coroutines, the implementation can still use blocking calls, and users aren't forced into using asyncio: you can see this in how we use blocking calls from the requests package within durable coroutines (expressed as async functions).

One alternative we explored was our AST rewrite with the experimental multicolor package. Using native language features is not just about where we spend complexity; it also gives us integration with a much more robust engine (e.g., we leverage the interpreter and tools that integrate with it instead of having to recreate the infrastructure), choosing to rely on well-beaten code paths greatly reduces the risk factor.

There are other alternatives we could explore as well; no matter which model ends up being more effective (and as Python usually goes, we might end up with more than one), being able to bridge with asyncio will become a necessity at some point.

achille-roussel commented 8 months ago

Leaving a couple more thoughts I had reflecting on this discussion.

Based on our conversations, I'm growing increasingly convinced that we need two answers to this question: one for the async/await world and one for the blocking world.

async/await and integration with async I/O: for any application that uses coroutines to do concurrent I/O operations, we will need to create a hybrid event loop where we can combine the dispatch of durable coroutine scheduling and async I/O events. On that note, we would probably need an async version of .dispatch that also uses asyncio so it can participate in the concurrent dispatch of I/O events. httpx makes it explicit with an AsyncClient type; we can follow a similar model and have AsyncDispatch; I find this error-prone, so maybe we can come up with a better idea!
an answer for the threading world: when concurrency in a server application is managed by threads (or subprocesses?), can we still get durability? We've sorted out a solution to restore generator stack frames; maybe we can do something similar for threads by creating routines similar to setjmp/longjmp in C. One limitation of this model is it would only work for sequential control flow unless we go one step further and integrate with the spawning of subthreads and synchronization primitives... my hunch is that considering the use case of dealing with simpler programs that don't want to opt-in to the complexity of async/await, this will likely be unnecessary. The threading solution would only matter for blocking style where the control flow is linear; applications that need more concurrency should use our async/await API.

What do you all think?