dabeaz / curio

Good Curio!
Other
4.04k stars 243 forks source link

Some thoughts about "Is curio going to evolve into a framework?" #35

Closed 1st1 closed 4 years ago

1st1 commented 8 years ago

I think that adding any complex application level protocols directly in curio (such as HTTP) is not a good idea indeed. However, it would be nice if curio could provide stronger foundation/guidelines for implementing complex protocols. For instance, in asyncio we have Protocols and Transports. They remove the need of writing lots of boiler plate code (like flow control), and they also standardize the approach, so people know how and where to start.

brettcannon commented 8 years ago

If @dabeaz targets a lower level point for asynchronous programming then I don't think he needs to go that way. I think the real problem we have as a community is we have not built out the low-level libraries for some of these protocols. For instance, why the heck isn't there an HTTP parsing library? I mean I look around at all of the various libraries involved with fetching data using HTTP and not one of them either provides separately or relies on a library whose only job is to take in some bytes off the network and parse it as if they are HTTP headers. If that sort of library existed then curios' examples of making HTTP requests would be easier to work with, and then transport stuff to handle fancy stuff like Keep-Alive and connection pooling could start to also be decoupled from the event loop and thus have a chance of being used across asynchronous libraries/frameworks.

dabeaz commented 8 years ago

I do not envision curio evolving into a full fledged framework as that is not my primary interest in it. I would much rather keep it small and nimble. There are certain practical things that I'd like to be able to do with it, mostly related to microservice kinds of uses (i.e., serving JSON, message passing, distributed computing, etc.), but I'm inclined to build those out as separate libraries as opposed to putting it in the curio core.

Honestly, curio is currently an experiment in supporting asynchronous I/O in a completely different way than asyncio. I think the code inside curio is substantially easier to understand and debug--and that is part of the overall experiment. Where it all might end up is kind of hard to say ;-).

nchammas commented 8 years ago

@brettcannon

For instance, why the heck isn't there an HTTP parsing library? I mean I look around at all of the various libraries involved with fetching data using HTTP and not one of them either provides separately or relies on a library whose only job is to take in some bytes off the network and parse it as if they are HTTP headers.

If I understood correctly, I believe that's what hyper-h2 is about.

kdart commented 8 years ago

I like that it's minimal and built on the latest coroutine features. I don't like the current asyncio framework because simply importing it pulls in a ton of stuff I don't necessarily need. In the past I've written my own little wrappers around epoll an kqueue that just do what I need. I was investigating the new async/await but then found this more fully fleshed out micro-framework. I like the idea of keeping it small and nimble and I hope that doesn't change. I'll probably use it in future projects (both personal and at work.

brettcannon commented 8 years ago

@nchammas You're right it is. I misread the docs the last time I looked at the library and mistook "connection" has "manages my connection" instead of "abstract connection" and implementing the state machine which delegates out to the actual socket.

dabeaz commented 8 years ago

One thing I'd like to flesh out more in curio is the monitor/debugging side of it. The first release does have a debugging monitor, but I think there's more that could be done there. I'm thinking along the lines of something like the Erlang etop functionality maybe. That would be kind of cool.

Lukasa commented 8 years ago

@brettcannon pointed me here because we'd been chatting a bit about this on Twitter.

Yeah, hyper-h2 is exactly aiming at the place Brett speaks about when he asks "why the heck isn't there an HTTP parsing library?" (in fact, it's good to see people making this complaint, given that I'm speaking at PyCon SK 2016 and PyCon US 2016 about exactly this problem!).

Without going into too much detail and bogging it down, there are some HTTP parsers floating around Python-land (I published pycohttpparser which is a wrapper around the C picohttpparser used in the H2O web server). However, none of these are as fully-featured as Hyper-h2.

I've considered writing something like Hyper-h2 for HTTP/1.1, and I may still do it sometime, but it's been a much lower priority for me than making Hyper-h2 as good as it can be. This is because, IMO, the ship has sailed with HTTP/1.1 for the foreseeable future. Even if I wrote a HTTP/1.1 parser/state machine, the reality is that it would be adopted by almost nobody. Requests/urllib3 would probably end up using it because we strongly dislike httplib/http.client (which we currently use, and aggressively work around), and if curio took off in a big way then maybe we'd land that too, but everywhere else the incumbent is already in place and doing just fine. For example, Tornado, Twisted, aiohttp, gevent, httplib, and any other HTTP/1.1 implementation in Python-land already went to the trouble to write themselves a HTTP/1.1 parser: why would they rip that out and replace it with a new thing?

With HTTP/2 I had the chance to get this right from the beginning because I was thinking about HTTP/2 two years before anyone else in the community started. This means the return on time invested is very high: why would anyone in the community write their own parser for such a complex protocol given that I've already spent thousands of man-hours on this one? For HTTP/1.1, the return on time invested is low, and may actually be smaller than 1: that is, I may spend more time writing and building the thing than anyone ever spends using it.

With that said, if someone decides this is the route they want to go and wants to aim for compatibility with Hyper-h2 (same event types etc.) then I am open to collaborating on the work and bringing it into the Hyper organization as a companion to Hyper-h2.

dabeaz commented 8 years ago

Random thought: Would it make any kind of sense to implement something like WSGI on top of curio?

kdart commented 8 years ago

Random answer: 42.

But really, why not? It would be a good choice for a SCGI server with WSGI.

klen commented 8 years ago

@dabeaz :+1: It will be great!

kdart commented 8 years ago

@dabeaz Hello, just now trying to use this with h2 using their example. It has some bugs. It currently stalls at about 65K transfers. Perhaps you may have some insight?

h2 issue 177

njsmith commented 8 years ago

FYI, here's a substantial start on a real HTTP server using Curio: https://github.com/njsmith/h11/blob/master/examples/curio-server.py

It's a bit toy-like in some respects, but it has solid error handling (possibly the hardest part of implementing an HTTP server), and playing around with wrk on my laptop it seems to easily handle a few thousand concurrent connections and a few thousand requests/second.

I haven't tried using it to implement a WSGI container, since it's a pedagogical example and the WSGI spec involves a certain amount of cruft. But it would be pretty straightforward to do -- AFAICT all the HTTP parts are already there, it's just the WSGI interface that would need implementing.

The more interesting question is what a curio-native version of WSGI would look like...

Lukasa commented 8 years ago

A hyper-h2 WSGI example is present here, by the by.

dabeaz commented 8 years ago

Wow, interesting. In addressing the "curio-native WSGI", are you thinking about the whole WSGI-callable interface? That is, WSGI invoking a callable with the environment/start_response and the callable returning a sequence of data that's emitted back?

njsmith commented 8 years ago

@dabeaz: what I meant by "curio-native WSGI" is some sort of WSGI-like interface that allows the response handler to be an arbitrary coroutine, instead of a synchronous function like in classic WSGI. I'm being intentionally vague about the details because I don't know how close one would want to stick to the WSGI spec -- it has some good parts and some weird parts :-). But as a straw man, imagine taking the WSGI spec and doing a global search-replace of callable->coroutine, iterator->async iterator.

dabeaz commented 8 years ago

By the way, this is very cool. I spent part of the weekend perusing aiohttp to see if there was any hope of porting it to run on curio. I'm not so sure--it seemed to use a fairly big part of the asyncio API that's not even remotely similar to what curio is doing. I didn't spend a whole lot of time fooling around with it, but my head was already exploding just looking at it for a few hours. So, something like h11 would probably have much more promise in the context of curio.

On the WSGI spec, I don't have any particular attachment to it. Considering that curio is already way out in left-field breaking all compatibility with asyncio, I wouldn't consider WSGI to be especially sacred either. I mean, why not break that as well ;-).

One thing that might be weird would be the whole definition of async-iterators. There's no convenient short-cut for making one other than a class with aiter() and anext() methods (there's nothing like an async-generator although having an "async yield" statement would be kind of diabolical). WSGI allows any iterable to be returned so things like lists would fall outside the scope of an async-iterator as well.

For what it's worth, I think it's kind of fun to envision what this API might look like in the context of async/await.

njsmith commented 8 years ago

Yes, the lack of async yield is super annoying. Generators provide an incredibly ergonomic API for stream processing, which is what network programming is all about, but there are no async generators...

Well, I fixed that :-): https://github.com/njsmith/rakaia/blob/master/rakaia/async_generator.py

from async_generator import async_generator, yield_

@async_generator
async def chunk_aiterator(sock):
    while True:
        chunk = await sock.recv(1024)
        if not chunk:
            break
        await yield_(chunk)

async for chunk in chunk_aiterator(sock):
    ...

It's a bit brain-melty but works nicely. I should probably put that up on pypi as a standalone package really...

Someone should also probably nag python-ideas about getting first-class support for this into 3.6. (I guess with async yield syntax, since regular yield runs the risk of confusion, particularly when porting code from 3.4 yield from to 3.5 await.)

1st1 commented 8 years ago

@njsmith I'm working on a PEP. Was a bit distracted to release uvloop, but I plan to get back to it in the next few days.

dabeaz commented 8 years ago

The async yield would definitely be cool. WSGI aside, I'm thinking it would be useful to set up generator processing pipelines (using async-for statements) as can be done with for-loops in synchronous code. That's a super useful technique. It would neat to have that in an async context too.

1st1 commented 8 years ago

Yeah, I think I've figured out most of the design parts, and now I have to finish my PoC implementation, which isn't as easy as I hoped it to be. Since I've already found at least one quirk in PEP 492 which we'll have to fix somehow, I don't want to send a PEP with some design flaws, so a working PoC is a must.

brettcannon commented 8 years ago

Some WSGI thing might be necessary if some common, low-level API can't be found that all event loops can implement. E.g. is there any reasonable way to make the event loop a context object that gets passed around and use a common API that all event loops agree on so there's a common idiom to get the non-blocking socket, etc. to pass into hyper-h2 or to create a common HTTP GET/POST library?

njsmith commented 8 years ago

@1st1: if you need a beta reader for your PEP feel free to hit me up.

@brettcannon: IIUC, WSGI does two things: (a) it provides a standard interface between servers (gunicode / mod_wsgi / ...) and frameworks for implementing response handlers (django / flask / ...), so you can mix and match them and write generic middleware, and (b) it bakes in a particular concurrency model where web response handlers are called in a synchronous blocking fashion. Part (a) is useful regardless of whether different event loops can get along, because even within each event loop universe probably people will write more than one web app and would prefer not to have to implement their own HTTP server from scratch each time. Part (b) though makes WSGI itself kinda useless in the curio/asyncio context, because response handlers need to be able to invoke arbitrary I/O of their own (e.g. fetching results from a database before generating a web page). So in the curio/asyncio context, you need to pass the loop into the response handlers, and unless you manage to make all event loops 100% interoperable then this user code is going to be bound to a particular event loop.

brettcannon commented 8 years ago

@njsmith Your (b) answer is why I'm wondering what needs to be done to work towards minimizing per-event loop code.

dabeaz commented 8 years ago

I've always felt that event loop code is minimized by focusing on async/await as the "pluggable" component, not the event loop. Curio takes this to an extreme in that there is no event loop (or kernel) explicitly involved anywhere in the API other than the top-level run() method to start things. Specifically, you never carry around a reference the loop/kernel and use it to invoke methods. You also never supply a loop/kernel reference to other objects (i.e., locks, queues, etc.). For all practical purposes, it simply doesn't exist. The fact that it doesn't exist means that in theory, something other than curio could probably be swapped in as long as the same async/await API was provided somehow (it wouldn't necessarily matter how it actually worked under the covers).

Reconciling this with asyncio where the event loop is needed just about everywhere is an interesting problem to think about to be sure.

Of course, keep in mind that Curio can get away with this by going full coroutine. Nobody goes full coroutine. ;-).

brettcannon commented 8 years ago

Just in case someone else stumbles upon this issue, https://mail.python.org/mailman/listinfo/async-sig was created specifically to discuss this sort of topic.

dabeaz commented 8 years ago

One general thought I'll throw out there is that I'm not actually sure if I would describe Curio as an "async framework." It's definitely a library related to concurrency and it uses async/await syntax for coroutines, but it tends to eschew almost everything else that one typically sees in asynchronous programming (e.g., callbacks, event loops, Futures, Protocols, etc.). Needless to say, that makes it a bit of an odd-duck.

brettcannon commented 8 years ago

Yes, it's an odd duck, but it's still useful. 😄

And I would actually still call it an async framework as it is still designed to help you run code asynchronously. And since curio's SDK and kernel sit on either side of one's own code (it's the bread in what I have called the async sandwich w/ the user's code the meat), that makes it a framework in my book.

njsmith commented 8 years ago

tends to eschew almost everything else that one typically sees in asynchronous programming (e.g., callbacks, event loops, Futures, Protocols, etc.).

I see curio as an intriguing exploration of the hypothesis that the best way to do asynchronous programming is to throw away all those clunky legacy constructs ;-). It's possible it would make sense to at some point add back some kind of interoperability with those other approaches and it would be pretty easy to do given curio as it exists now, but it's interesting to see how far one can get before needing that.

dabeaz commented 8 years ago

I agree that this is an interesting experiment. I sometimes get the impression that people view async/await as the end of the story for a lot of past work in async ("hey, here's this cool new feature that makes it look nicer"). In Curio, I see async/await as the launching point of a completely new story.

Instead of throwing away the legacy constructs, maybe they're better viewed as a kind of "assembly language" of async. Perhaps Curio is sweeping those details under the covers in a way where one doesn't need to be too concerned about them. I find this pretty interesting.

pikeas commented 8 years ago

There have been a couple of articles posted recently about issues with asyncio, including http://lucumr.pocoo.org/2016/10/30/i-dont-understand-asyncio/ and https://vorpus.org/blog/some-thoughts-on-asynchronous-api-design-in-a-post-asyncawait-world/.

@dabeaz You've called curio an "experiment" several times - is that accurate, or are you selling curio short? Is curio suitable as a 3rd party stable substitute for asyncio?

ZhukovAlexander commented 8 years ago

@pikeas , as all we probably know, some experiments end up being a revolution:

Hello everybody out there using minix - I'm doing a (free) operating system (just a hobby, won't be big and professional like gnu) for 386(486) AT clones [...] Linus (torvalds@kruuna.helsinki.fi)

Speaking of curio, it definitely has a lot of potential, compared to other libraries. The way you think about curio (for example, as a mini OS), opens up some cool opportunities. INHO, with enough community support curio can do really well in the asynchronous world.

dabeaz commented 8 years ago

Is it curio a replacement for asyncio that has the features required to do useful things now? Yes. Is it stable? No. Everything about it is subject to change at any time.

Looking at it realistically, curio is basically a one-person project and it's using async/await in a completely different way than what was probably originally envisioned. I think it's still appropriate to call it "experimental" for now. It gives the project more space to try new ideas. It also reflects the fact that to be truly usable, curio would need some other libraries for supporting useful protocols like HTTP.

njsmith commented 8 years ago

@pikeas: The way I'm thinking about it - hopefully @dabeaz agrees :-) - is that curio is at that awkward stage where what it is needs people to build things on it -- even though it isn't stable yet! -- so we can figure out what the awkward bits and missing pieces are, and make it stable.

imrn commented 8 years ago

On 11/8/16, David Beazley notifications@github.com wrote:

Looking at it realistically, curio is basically a one-person project and it's using async/await in a completely different way than what was probably originally envisioned.

For curio there are mentions of one-man project flavours around the web and even here. I strongly disagree with this undervaluation.

Simplicity is the most valuable thing for Curio, which can understandably be achived without a large group. So please stop small-project attitude starting here. No need to be sorry for the small code base.

Infact, ideally and asymtotically there would be no async library, as languages will become transparently async. Of course it is an utopia. However, the closest thing to this is Curio. Thanks for being a pioneer.

I were a heavy user of asyncio and in search for the so called right style, I experimented various flow patterns and carried some customizations to asyncio which became unnecessarily complicated over time.

At that point I settled down to this: Async code should be identical to blocking code. (Except for async defs, awaits, etc and some management code.) Async code should not be any special. This style is possible with Curio. It is the main reason for my migration.

imrn commented 8 years ago

I'd also suggest having some kind of understanding (or even better parallelism) with rust which is exploring its options for async programming.

Curio is mentioned positively as an async model: https://github.com/rust-lang/rfcs/issues/1081

Does "rurio" sound good?

dabeaz commented 8 years ago

Hmmm. rurio? "rustio" (if pronounced correctly) has a nice ring to it ;-). But what do I know?

imrn commented 8 years ago

'rustio' sounds much better. However it is generic and similar names is already around. 'rurio' has minimal calligraphic divergence. But phonetically not good.

How about "Curio" really? How about opening subdirectory in Curio as a rust play ground?

Needs and findings can be communicated with rust community which is very welcoming. Curio already has the sympathy of many folks.

nchammas commented 8 years ago

Dunno how serious we are about taking Curio to other programming communities, but if we are, here is an example of a project that did that which Curio could perhaps follow: docopt

Docopt started off as an idea in Python about how to design a command-line argument parser. People liked the idea so much that they took it to other languages. So in addition to the original Python docopt, there is docopt.rs, docopt.cpp, docopt.net, etc. Each is a separate repo under a single docopt organization.

dabeaz commented 8 years ago

I'm already stretched pretty thin so extending curio in this way is not something that I'd be able to work on personally. I'm not opposed to someone taking curio-like ideas and applying them to other languages. It might make more sense for that to take place in a separate project though.

nchammas commented 8 years ago

Absolutely. In fact, looking at docopt, it appears that each language project (e.g. docopt.java vs. docopt.net, etc.) has a different set of authors and contributors. So it's more like an affiliation of separate projects based on a core design idea, rather than a single group of people writing docopt over and over again in different languages.

So if some folks want to take a stab at recreating Curio in Rust, it sounds like they should just go ahead and do it. If it works out, then perhaps down the line the two implementations (original and Rust flavor) can be brought together under a single GitHub organization, like how docopt did it.

imrn commented 8 years ago

Yes, it's a matter of man power and possibilities. Before I'll provide my reasoning below, let me put another bookmark here:

Rust-await: https://github.com/icorderi/rust-await

Let's itemize some thoughts. Please add yours if you see as appropriate:

General Ones

Personal Ones

njsmith commented 8 years ago

There's plenty of serious, scalable code written in Python. And Python and Rust are radically different languages; APIs aren't going to translate directly. . This issue thread is getting pretty off topic, and I'm not sure what the original topic even was :-). Maybe it's time to close it?

On Nov 11, 2016 12:10 PM, "Imran Geriskovan" notifications@github.com wrote:

Yes, it's a matter of man power and possibilities. Before I'll provide my reasoning below, let me put another bookmark here:

Rust-await: https://github.com/icorderi/rust-await

Let's itemize some thoughts. Please add yours if you see as appropriate:

General Ones

  • Primary reason curio is a python project is because it has async def/await. Right? And the included batteries.
  • Both curio and asyncio is temporary. Why? Because if you need something 'serious' none is appropriate because of the pythons achilles heels.
  • If you need something 'good enough', never bother with asyncs. Go right into blocking threads.
  • Then why we are here? We all know scalability lies in asyncs. But if you 'really' need such a scalibility, do you think python will make it? :(
  • Then it all means that consciously_or_not we are here just to prepare 'prototypes' for the next async native code.
  • Two old and new candidates: C++ and Rust. Note that being aware of this, Guido is trying to bring static type hinting to python. Cosmetic for now. Possible performance improvement in the future? May be. May be not..

Personal Ones

  • PyCharm IDE is the most important reason I'm on python. And 'currently' I do not need serious performance.
  • JetBrains (vendor of PyCharm) is working to bring rust to their IDEs. Progress is good.
  • For large projects and groups, language formalism is the gold standard. (Read it as 'static typing' in the most shallow terms.)
  • Rust batteries are currently %60-70 charged.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/dabeaz/curio/issues/35#issuecomment-260044811, or mute the thread https://github.com/notifications/unsubscribe-auth/AAlOaAN8yNkLwTOYV931Z5JdpJIvqmXJks5q9Mu1gaJpZM4HVtRt .

imrn commented 8 years ago

On 11/11/16, Nathaniel J. Smith notifications@github.com wrote:

This issue thread is getting pretty off topic, and I'm not sure what the original topic even was :-). Maybe it's time to close it?

Let me summarize the topic:

ghost commented 6 years ago

I see a few people mentioning scalability here and an earlier comment from @brettcannon made me want to mention this (I was searching curio GitHub for an unrelated item and stumbled this.)

One thing worth noting: A standards based HTTP/2 server with actual standards based HTTP 1.1 fallback exists with python bindings: nghttp2

It handles both client and server aspects, and its quite performant. Development is open and I must say I find it much more pleasing to code against than hyper-h2. I don't know all the implementation detail specific differences but I'm a big fan of this project and (now for my subjective opinion) I don't feel like the python community is giving it enough love.

I must also say secondly, WSGI is set to be transformed, there is a number of viable alternatives, such as ditching it all together (Tornado has for instance), re-implmenting WSGI to be async (some work on this is going on over at the Django Project with ASGI) and I can't imagine the smart folks at the Python Software Foundation and all of our lovely open source developers haven't given thought to how to advance this in the future, I have heard many rumblings about how to build out asyncio to be as future proof as possible. A project the size of Python doesn't always move at the speed of light though, and arguably Rust is a smaller by size project and is new & hot right now so it has the advantage of more 'spotlight' but I believe Python will have native implementations of these things soon enough.

In as so far as a Request/Response/Cookie parsing mechanism, I suggest looking at Webob by the Pylons folks. Its simply a Framework for just that: Bolting Request/Response/Cookie/HTTPStatus mechanism to your desired implementation. Their example is that of WSGI, but itself is not a WSGI app and experimentally I have had some success using it with asyncio (and I believe it will work with Curio).

I just wanted to Chime in here, because there are interesting things on the Horizon, and I don't want anyone coming in to Python to not see the exciting developments that are indeed taking place.

Is it perfect? No. Is it realistic and deployable? You bet.

brettcannon commented 6 years ago

@ProtonScott but nghttp2 manages the I/O, correct? hyper-h2 has a very specific structure by following a sans I/O design, so the two libraries are targeting two different levels of the network stack. IOW I don't think it's quite fair to compare hyper-h2 directly to nghttp2.

Lukasa commented 6 years ago

nghttp2 doesn’t manage the I/O, it has the same design philosophy as hyper-h2.

brettcannon commented 6 years ago

Oops, sorry for my bad skimming of the docs.

Either way, Dave has clearly said in the past he doesn't plan to make curio into a full-fledged library that he has to maintain, so we should probably focus on e.g. Trio as Nathaniel seems up for trying to make that library into something that can eventually go into production.

imrn commented 6 years ago

It will be fun to see asyncio/wsgi or trio/http1...2 couplings. In the mean time, curio based solutions will probably be the safest bet.

imrn commented 6 years ago

Nghttp2 seems pretty decent. May be we can experiment with curio+nghttp2 and share our experiences.

dabeaz commented 6 years ago

Where, precisely, have I ever said that Curio is something that I don't intend to maintain?

Apparently it must be some kind of impossible to understand mental concept that Curio can remain a small, very specific library, that stays focused on what it's doing right now. Or that I need to be sitting here adding feature after feature upon feature. Not everything needs to turn into some kind of giant framework from hell. Not every open source project needs a freaking laptop sticker and a conference dedicated to it.

Did I write Curio to be a replacement for something like Twisted? No I didn't. Do I want it to be usable and reliable for what it is? You bet. I'm certainly open to any bug reports and pull requests that fix bugs and improve its documentation. However, if people are sitting around waiting for me to patch in some kind of turn-key HTTP/2 support, they're going to be waiting a long time. Curio is a building block for creating other interesting things--not an all-inclusive framework that attempts to solve every possible problem involving I/O. I'm happy with it being exactly that.

That said, I'll be pushing a new release fairly soon. Bug reports welcome ;-)