dabeaz / curio

Good Curio!
Other
4.02k stars 241 forks source link

Proposal: An primitive and universal API for async projects #266

Closed guyskk closed 4 years ago

guyskk commented 6 years ago

There are multiple async frameworks and lots of libraries in python world, but they are not compatible with each other. This make both library authors and library users headache. If I choose curio, then I can't use libraries write for trio/asyncio, then I lost many options. If I write libraries for curio, then I can't support trio/asyncio easily.

The idea of separate user space and kernel in curio is great, but the separation is not thoroughness. I still get locked into curio if I use it. I proposal to make a standalone project(API) which not include kernel. The architecture as follows:

               +-----------------------------+
               |                             |
               |         application         |
               |                             |
       +---->  +--------------------+        |
       |       |                    |        |
       |       |       stdlib       |        |
       |       |                    |        |
 api   |       +--------------------+--------+
       |       |                             |
       |       |           syscall           |
       |       |                             |
       +---->  +-----------------------------+
       |       |                             |
kernel |       |           kernel            |
       |       |                             |
       +---->  +-----------------------------+

       curio = api + curio-kernel

       trio = api + trio-kernel

The propositional new project should contain and only contain stdlib layer and syscall layer. The syscall layer should keep primitive and universal (as small as possible, but not too small).


My experimental implementation is here, document in code, worth discussing: https://github.com/guyskk/newio/blob/master/newio/syscall.py

There are some stdlibs included in https://github.com/guyskk/newio, some codes are borrowing from curio and trio.

And an experimental kernel: https://github.com/guyskk/newio-kernel just treat it as a black box which can run coroutine.

@dabeaz @njsmith

njsmith commented 6 years ago

Are you familiar with the multio project? https://github.com/theelous3/multio/

You might also be interested in https://github.com/urllib3/urllib3/issues/1323, which discusses a generic strategy for supporting multiple async backends and a sync backend within a single codebase. That approach is limited to libraries that are "mostly synchronous" though, like urllib3 which traditionally has been entirely synchronous.

I'm pretty dubious about there being a generic, universal solution here. The reason there are multiple incompatible projects is because their authors genuinely disagreed on what semantics were best, so it's not trivial to paper over the differences. From a quick skim, your prototype syscall interface is incompatible with Windows (they can't do "wait for fd readable/writable"), and your task and cancellation APIs are ... well, a major reason I wrote Trio is because I wanted to not use those APIs :-). Have you seen my blog posts Notes on structured concurrency and Timeouts and cancellation for humans? They should at least give some sense of why some of us would rather not use those constructs.

Re: the reference kernel, are you worried about the https://xkcd.com/927/ problem?

guyskk commented 6 years ago

@njsmith Thank you very much, I have read your blogs before write this proposal. I like the nurseries mechanism and implement it as part of stdlib. For the cancellations, I think it should be forced and determinated(If I cancel a task, the task must stop immediately, like kill -9 in linux), so that resources will be released. The timeout/cancel token are more about coordinate, they are not forced.

I known it's hard to reach an agreement in some cases, but most cases are easy to accordance, eg: socket, ssl, sychronization primitive. Can we build a project and put things after reach an agreement, so that it will be more and more primitive and universal?

guyskk commented 6 years ago

@njsmith I'm curious why can't do "wait for fd readable/writable" on windows? curio also has _read_wait and _write_wait traps(syscall) which has the same interface, they are incompatible with windows?

imrn commented 6 years ago

On 5/31/18, Nathaniel J. Smith notifications@github.com wrote:

I'm pretty dubious about there being a generic, universal solution here.

I agree. Two reasons: 1) Differences of world views of the developers brings numerous accessories with the frameworks with quite different api's . Although this is a 'problem', it is NOT serious as the following one:

2) Choices made for main execution logic affects the main structure. That is, they change the way main async functionality is exposed to outside world. This mix consists of coroutines, generators, even sync functions. It determines the core api.

(1) 'may' be resolved with some consensus. May be not, since it is also related to (2), but with some indirection.

But (2) is not an easy one. Anyone who wants to deal with the problem should have a sharp view about the very 'core' ingredients of the frameworks and their differences.

njsmith commented 6 years ago

For the cancellations, I think it should be forced and determinated(If I cancel a task, the task must stop immediately, like kill -9 in linux), so that resources will be released. The timeout/cancel token are more about coordinate, they are not forced.

This kinda makes it sound like you actually want to invent your own 4th API for async concurrency? Which, I mean, I'm hardly in a position to criticize you for that :-). But it's something to think about.

FWIW, none of the existing libraries have a kill -9-style primitive, exactly because within a Python process, you kill -9 (= dropping a task from the scheduler) leads to resource leaks, while cooperative cleanup (= throwing in an exception) does not. The only partial exception is asyncio, where tasks can be abandoned mid-way through when the loop exits, but you still can't do it on demand, so it's mostly an unhelpful bug magnet rather than anything useful.

I'm curious why can't do "wait for fd readable/writable" on windows?

Because that's just how Windows works... the operations are possible for sockets, but not for other types of objects you might need to wait for (processes, named pipes, etc. etc.), and even for sockets it's considered the old slow approach.

curio also has _read_wait and _write_wait traps(syscall) which has the same interface, they are incompatible with windows?

Yes, curio doesn't really make any serious attempt to support Windows. In general, curio's priority is to be an excellent sandbox for experimenting with novel ideas for async APIs, and doesn't spend much energy on boring "make it work for users in production" kind of stuff like this.

dabeaz commented 6 years ago

I think trying to come up with a universal API that targets Curio/Trio at a low level (such as syscalls) leads to the same kind of trap that asyncio fell into. Specifically, the notion that it could be a universal interface for implementing event loops and that different event-loop libraries could be built on top of asyncio.

For me, the interesting API would be at a higher level. For example, is there a standardized API for programming sockets with async? If so, is that API supported by asyncio, curio, trio? If so, then perhaps you'd be able to program an application using that API without really worrying about the underlying implementation details about the library that's running it. Focusing the API at a higher level also allows one to deal with platform specific differences such as Unix/Windows. There is also past precedent for focusing on high-level APIs in other contexts. For example, WSGI or the Database Protocol.

I talk about this idea quite a lot in my PyOhio talk from 2016. https://www.youtube.com/watch?v=E-1Y4kSsAFc

njsmith commented 6 years ago

standardized API for programming sockets with async?

There are still some tricky bits here, like how curio and trio disagree about which operations should be marked async (though this is more of an issue for stuff like locks and events than sockets), and how asyncio's lowest portable abstraction layer is higher-level than sockets, and how maybe you want to use trio's higher-level layer instead of raw sockets. But generally yeah, the way I think about it is that these libraries are basically some concurrency system + some cancellation system + a bunch of I/O primitives, and most of the innovation is in the first two; the I/O primitives are fairly obvious and boring in comparison.

That's the nice thing about "mostly synchronous" libraries like urllib3: they only need the I/O layer :-). urllib3 in particular is a little more complicated, because it turns out that high-quality HTTP clients need a little bit of internal concurrency (specifically sending and receiving simultaneously on a single socket), but we have an actual working cross-library socket abstraction layer that exposes what urllib3 needs:

https://github.com/python-trio/urllib3/tree/bleach-spike/urllib3/_backends

(or eventually this will probably move to https://github.com/python-trio/urllib3/tree/bleach-spike/src/urllib3/_backends , depending on when you read this comment)

CreatCodeBuild commented 6 years ago

Indeed. Interoperable API is more interesting than interoperable runtime. Also, if we think about the bigger picture here. Applications are not just written in Python or the same Python frameworks.

You can have one application/process that is written in Curio and fully harvest the power of Curio or Trio or whatever XXXio. You can also have another application in another language in another machine possibly. Their communication is better done through common networking/sys interface such as sockets/files.

Even if you feel socket/TCP is too low level, you can use protocols such h2 and data serialization format such as protobuf to communicate between process boundaries.

guyskk commented 6 years ago

I realize some things:

  1. Universal API is impractical, it's heavy depends on API authors.
  2. Interoperable API is important and it's the goal of this proposal, otherwise we lost a lot of options and make a lot of incompatible libraries.
  3. Replaceable Kernel(Runtime) is valuable. Improvements are put forward all the time and replaceable kernel enable painless change kernel.

I change the architecture as follows:

               +-------------------------------------------+
               |                                           |
               |                 application               |
               |                                           |
        +--->  +-----------+-----------                    |
        |      |           |          |                    |
  api   |      | curio-api | trio-api |  asyncio-libraries |
        |      |           |          |                    |
        +--->  +-----------+----------+------+             |
        |      |                             |             |
syscall |      |           syscall           |             |
        |      |                             |             |
        +--->  +--------------+--------------+             |
        |      |              |              |             |
kernel  |      | curio-kernel | trio-kernel  |             |
        |      |              |              |             |
        +--->  +--------------+--------------+-------------+
               |                                           |
               |               asyncio & loop              |
               |                                           |
               +-------------------------------------------+
  1. I give up the idea of stdlib, which will also break backward compatibility of some libraries.
  2. Build curio & trio api on syscall, no direct depends on kernel, so that different apis are interoperable.
  3. Implement syscall in curio & trio kernel.
  4. Implement syscall in curio & trio kernel over asyncio, so that curio/trio/asyncio-libraries are interoperable. (make kernel a wrapper of asyncio)

To achieve the goals, the syscall should meet requirements below:

  1. Support to build curio & trio api without break (maybe most of) backward compatibility.
  2. Keep small and primitive.
  3. Can be implement.
  4. Can be implement on the top of asyncio.

I think (1)(2)(3) is possible and not too difficulty, (4) is hard but also possible.

njsmith commented 6 years ago

Trio and curio are both strictly more powerful than asyncio (despite being much simpler), so you can't implement them on top of asyncio. (Also you're not going to convince me or @dabeaz to do this even if it were possible.) Going the other way is viable, e.g. https://github.com/python-trio/trio-asyncio

The other advantage of something like trio-asyncio is that it provides a clean separation between the different APIs, e.g. it handles translating between asyncio and trio's different cancellation semantics, and while asyncio violates a bunch of trio's normal invariants, we can kind of keep the weird stuff cordoned off in a special quarantine area.

guyskk commented 6 years ago

Surely I want to convince you and dabeaz, the blueprint can't be achieved without you both. Your blogs and speeches teach me a lot. I must to say big thanks to you both!

dabeaz commented 6 years ago

One thing that I feel has been lost in much of the discussion about Curio is the important of environmental isolation. In any non-trivial application, you are going to have synchronous code, asynchronous code, and possibly synchronous code that involves threads. This presents a certain challenge for API design. For example, consider an event and the process of setting it:

evt = Event()
...
evt.set()

Is that set() operation safe to use in normal synchronous code? Does it work in an async environment? Is it thread-safe? It almost certainly depends on the kind of Event that is created. For example, threading, asyncio, multiprocessing, trio, and curio all have their own implementation of Event. However, only the Curio version requires the use of an await. So, it's this:

evt = curio.Event()
...
await evt.set()

In all of the quibbling about cancellation points and scheduling, something about isolation got lost. The reason this is an await in Curio is that Events are only safe to use in an async context. It is not safe to use set() on a Curio Event in normal synchronous code. It is not safe to use set() in threaded code. The fact that it requires an await() makes this explicitly clear---there is no way to even invoke such a function outside of an async environment. Since that whole possibility is cut off, it makes it much easier to reason about in general. It also makes the implementation of Curio much easier to deal with because it does not have to concern itself with a bunch of esoteric corner cases (i.e., what happens if this gets called by a thread?). To me, reasoning about matters such as cancellation points are trivial--those are easy to address in Curio and easy to change. Reasoning about execution environments, however, is a much more difficult problem. And it's not something which can be easily dismissed in my opinion.

The focus on environments in Curio has opened up a variety of interesting programming avenues as well. For example, I think Curio UniversalQueue objects are pretty interesting. Asynchronous threads are pretty interesting too. Or, if you want to blow your mind the examples/requestio example in which requests runs on top of Curio without modification. All of these rely pretty heavily on environment isolation. So, in thinking about APIs, I would hope that this would be a major topic of discussion.

njsmith commented 6 years ago

That's an interesting point. Trio uses await to help users keep track of cancel+schedule points, but that means it can't be used to keep track of which operations require 'trio context'. Curio uses await to help users keep track of which operations require 'curio context', but that means it can't be used to keep track of cancel+schedule points. On the whole I think trio's choice is probably correct, because (a) if you look at existing async programs then the majority of code is inside async context and not doing clever things like jumping back and forth between threads, so most of the time users don't need to keep track of this distinction, (b) when people get confused about cancel/schedule points it leads to subtle latent bugs, while trying to call a trio function outside of trio context immediately raises an exception, and (c) constructors may depend on being called from trio context but can't be marked async.

But it is a real trade-off, with advantages and disadvantages either way.

the examples/requestio example

Ha, I hadn't seen this. That's extremely cute :-). So does that mean that if your call stack goes from curio → thread → back into curio, then the cancellation system tracks that all the way through? That's impressive, and interesting...

dabeaz commented 6 years ago

On the whole, I find the Event.set() example extremely interesting from an overall API design perspective. Events are commonly used as a kind of signaling device where you might write code like this:

def do_something(args, evt):
      ... whatever ...
     # Signal completion
      evt.set()

There's nothing here to indicate the type of Event that can be passed in. Perhaps it's a threading.Event. Maybe it's some other kind of event. If one were to call that function from an async context, it'd probably work okay even though it's a synchronous function. For example:

async def spam():
        evt = trio.Event()
        ...
        do_something(args, evt)

Things sort of get dicey as things get more complicated though. As a signaling device, someone might think that an Event object can be used from threads. Perhaps an async Event escapes the confines of its async sandbox and gets passed into such a function in a separate thread. Maybe it works. Maybe not. It's a little scary because effectively a thread is performing some kind of "drive-by" scheduling operation on the event-loop/kernel and unless it's protected sufficiently, you're likely to get a race condition. You sometimes see functions such as do_something_threadsafe in async libraries that account for possibilities like this. To me, it all seems a bit messy though.

The Event.set() is also interesting in other ways. If there are tasks waiting, this operation is supposed to wake up the tasks. However, if one were to introduce things like task priorities, you might end up with a situation where an awakened task has higher priority than the the task that just set the event (priority inversion). Does that cause an immediate task-switch or not? If it does, then does Event.set() becomes a schedule point? If it does, then it has to be async in order to task switch. But if it's now a schedule point, is it also a cancel point? It's seems weird that it would be a cancel point--setting en event doesn't normally block.

I've taken a certain approach in Curio. I have no idea if it's the "correct" approach except to say that it's sitting in the intersection of several different tricky issues. There are probably other things sort of like this (e.g., close() methods) where one would definitely want to spend some time carefully considering the design.

On the whole, I find all of this really interesting.

agronholm commented 5 years ago

@guyskk Check out AnyIO, soon to have a 1.0 final release.