dabeaz / curio

Good Curio!
Other
4.02k stars 241 forks source link

Add `readable` method to Socket #238

Closed guyskk closed 6 years ago

guyskk commented 6 years ago

I need this method for HTTP keep-alive check, and I found there is a writeable method but no readable method, here is the demo code for my use case:

try:
    await timeout_after(self.keep_alive_timeout, cli_sock.readable())
except TaskTimeout as ex:
    await cli_sock.close()
guyskk commented 6 years ago

The test test_worker_timeout failed, but I had no idea how to fix it. And should I replace all await _read_wait(self._fileno) and await _write_wait(self._fileno) to await self.readable() and await self.writeable() ?

dabeaz commented 6 years ago

I'm not really sure I fully understand the utility of a separate readable() method. Wouldn't a call to readable() almost always be followed by a call to actually read the data? If so, wouldn't it make more sense to put the timeout on the read operation instead?

For what it's worth, the writeable() method is there primarily to enable a non-greedy write. Normally, Curio, tries to write first and only blocks if the write fails. writeable() is there to flip the order around if you want--you can check to make sure a socket is writeable first and then only write if it would succeed. To be honest, I'm somewhat inclined to remove the writeable() method rather than give it a readable() sibling.

imrn commented 6 years ago

On 12/19/17, David Beazley notifications@github.com wrote:

If so, wouldn't it make more sense to put the timeout on the read operation instead?

May be on the socket/stream itself.

dabeaz commented 6 years ago

No. Any blocking operation whatsoever can have a timeout applied to it. I'm not going to special case timeouts at the socket/stream level. What I mean, is why wouldn't you just do this:

 async with timeout_after(timeout):
          data = await sock.recv(maxsize)       # Or whatever

I don't see what having a separate blocking readable() method is really buying you.

guyskk commented 6 years ago

@dabeaz Yes, a call to readable() is followed by a call to actually read the data, but the caller of readable() is something like connection manager, which should't actually read the data, and the caller of read() is something like http parser, which will consume the data. I don't like to implement the connection manager logic (eg: keep_alive_timeout) in http parser.

imrn commented 6 years ago

On 12/19/17, David Beazley notifications@github.com wrote:

 async with timeout_after(timeout):
          data = await sock.recv(maxsize)       # Or whatever

Of course we can have this. But a timeout in socket/stream saves us from lots of context manager code lines similar to this.

I admit it is an old habit from past sync era and I don't mind if you want to stick to async purism.

Never the less, I already have such custom wrappers. But note that, embedded timeouts makes curio code more parallel to sync code, which is also a "feature" of curio.

I don't see what having a separate blocking readable() method is really buying you.

I guess it is also an old sync habit which I'll strongly oppose. Because -unlike the above syntactic sugar suggestion- this has the potential of disturbing async logic of the developer. It will encourage bad async style.

The core idea of asyncs it to eliminate sync checks and loops. This seemingly innocent readable() func punches it badly.

On 12/19/17, kk notifications@github.com wrote:

Yes, a call to readable() is followed by a call to actually read data, but the caller of readable() is something like connection manager, which should't actually read the data, and the caller of read() is something like http parser, which will consume the data. I don't like to implement the connection manager logic (eg: keep_alive_timeout) in http parser.

Let me guess: You are trying to implement http1.1 pipelining or http2.0 server or a middleware like proxy... Right? I can assure you it is doable elegantly without readable(). But you'll need connection manager to recycle or close the connections anyway since no single http request/response can decide on the fate of a connection. (Other than close requests or someother thing which I couldn't remember right now.) Someone has to have a wholistic view. No shortcuts there.

imrn commented 6 years ago

To be honest, I'm somewhat inclined to remove the writeable() method rather than give it a readable() sibling.

Agreed..

dabeaz commented 6 years ago

One of the reasons why there is no timeout parameter on calls is that literally every single blocking call in the whole library potentially allows a timeout. Early on, the API for that was getting to be insane. It made much more sense (to me anyway) to have some kind of wrapper that could apply a timeout to whatever you were doing if you really wanted it.

imrn commented 6 years ago

We think Curio as an async library. Right?

How about this:

Curio: Syncs.. As it should be..

CreatCodeBuild commented 6 years ago

In my own uses, writable() and readable() were never needed.

njsmith commented 6 years ago

The use case for writable is when you have a latency-sensitive protocol, and you want to delay making a decision about which bits to send until the kernel is ready to accept them - see #83. For example, imagine a screen-sharing protocol that loops on taking a screenshot and sending it -- if the network isn't ready to accept data, then you want to delay taking the screenshot until it is, rather than queue up stale data.

The use case for readable is very specific: if you have an HTTP client with a connection pool, then after using a connection you put it back in the pool and ignore it for a while. Later, when you decide to make another request to the same server, you pull the connection back out again to re-use. However, in the mean time, the server might have gotten bored and closed the connection (possibly after sending some 408 Request Timeout response). So when retrieving a connection from the pool, you want to do something like:

  1. check if the socket is readable. If so, the server has given up. This isn't an error; it just means we need to close this connection and make a new one.
  2. Once we have a connection that seems to be valid, send our request, while simultaneously listening for a response. At this point, an early response or socket close indicates an error → this request failed.

So you really do want a non-blocking "is there data there to read?" query here. In my efforts to port urllib3 to Trio, my plan so far is to just drop down to the lower level socket API here and call select or whatever, since this is such a weird and specialized thing that I don't want to put it into my abstract stream API.

Note in particular that for a TLS-encrypted connection, there isn't in general any way to check if the logical, application-level data stream is readable without actually trying to read data, but in this case that's not what we want anyway – here we really just want to check if there's any data on the underlying socket level.

guyskk commented 6 years ago

Let me guess: You are trying to implement http1.1 pipelining or http2.0 server or a middleware like proxy... Right?

I'm tring to implement http1.1 persistent connection in server side, I try to show the wholistic view in code below:

async def _worker(self, cli_sock, cli_addr):
    # browsers may preconnect but didn't send request immediately
    # keep-alive connection has similar behaviors
    # just close the connection if keep_alive_timeout exceed
    try:
        await timeout_after(self.keep_alive_timeout, cli_sock.readable())
    except TaskTimeout as ex:
        await cli_sock.close()
        return
    # parse request, will send `408 Request Timeout` response if request_timeout exceed
    parser = RequestParser(cli_sock, cli_addr)
    try:
        request = await timeout_after(self.request_timeout, parser.parse())
    except TaskTimeout as ex:
        response = Response(408)  # Request Timeout
    else:
        response = await self.app(request)
    # send response
    async for chunk in response:
        await cli_sock.sendall(chunk)
    # close the connection or wait for next request(keep-alive)
    if not response.keep_alive:
        await cli_sock.close()
    else:
        await spawn(self._worker(cli_sock, cli_addr), daemon=True)

The use case for readable is very specific: if you have an HTTP client with a connection pool, then after using a connection you put it back in the pool and ignore it for a while. Later, when you decide to make another request to the same server, you pull the connection back out again to re-use. However, in the mean time, the server might have gotten bored and closed the connection (possibly after sending some 408 Request Timeout response)

The ConnectionPool will check if the socket is readable, but this can't be implemented by await sock.readable() because await sock.readable() will block until socket is readable. It needs a sync and non-blocking is_readable() method actually.

I implemented this check very ugly, in curequests(https://github.com/guyskk/curequests/blob/master/curequests/connection_pool.py#L75):

def _is_peer_closed(self):
    """check if socket in close-wait state"""
    # the socket is non-blocking mode, read 1 bytes will return EOF
    # which means peer closed, or raise exception means alive
    try:
        r = self.sock._socket_recv(1)  # FIXME: I use a private method, bad!
    except WantRead:
        return False
    except WantWrite:
        return False
    assert r == b'', "is_peer_closed shouldn't be called at this time!"
    return True
njsmith commented 6 years ago

Oh, I see, right, I got is_readable and wait_readable mixed up...

For the server, I don't see why you want a wait_readable. I can see how t might be useful if you want to provide a timeout for time-to-first-request-byte, but (a) I'm not sure what that specific timeout semantics are especially useful -- e.g. in this example curio web server I just imposed a timeout on time-to-end-of-request-headers. And (b) it doesn't seem like it'd be that hard to pass some argument into parser.parse saying "please wrap your first call in conn.recv (or whatever curio calls it) in a timeout_after with this timeout".

njsmith commented 6 years ago

Oh, and speaking of removing writable, I'm actually considering one approach for doing that in trio here: https://github.com/python-trio/trio/issues/371

But I doubt Dave will like it :-)

imrn commented 6 years ago

This thread is an example of how that "evil" writeable() misled kk, spoiling his style and time. I'd suggest removing writable() and any other possible sync checks from curio. Trio may have them, as it explores such concepts.

dabeaz commented 6 years ago

The only reason I wouldn't like the Trio approach is that it's trying to make Trio do something--I don't want Curio to do anything. Doing things is the whole problem.

That said, I'm kind of inclined to remove writeable() and readable(). Mainly in the interest of having a smaller API, but also because they're kind of misleading (i.e., are people confused into thinking that such calls are necessary?). If one really needs to test is a socket is readable, I concur that one could run it through a zero-timeout select() call and find out. I'd consider it to be a special case. I'd also consider all uses of writeable() to be a special case that Curio doesn't really need to address directly.

njsmith commented 6 years ago

@dabeaz I was thinking you wouldn't like it because it would force every call to sendall to yield to the kernel, which was a point we disagreed on before :-). Probably also relevant to note though that that trio issue is talking about Trio's higher-level stream interface, the equivalent of curio.SocketStream rather than curio.io.Socket. (However, trio.SocketStream.wait_send_all_might_not_block is of course implemented using trio.socket.SocketType.wait_writable.)

guyskk commented 6 years ago

I agree with "having a smaller API" since my needs can be implemented without readable() method. It's easy to add features but not easy to remove features.

dabeaz commented 6 years ago

This is more of a meta-question, but has anyone ever implemented a "screen-sharing application" or something similar to it in Python? If so, what library was used?

njsmith commented 6 years ago

@dabeaz Part of the reason that use case comes to mind so quickly for me is that long ago I wrote a little app called Xpra, which is still in reasonably wide use under different management. For networking I originally used glib (since I was using gtk for the UI anyway), and then later switched to threads (sigh) for better Windows compatibility. I'm not sure what they use these days.

Screen sharing is also the example that Apple used in their WWDC talk on TCP_NOTSENT_LOWAT and related things, I guess because it's so dramatic.

HTTP/2 is another protocol that acts like screen sharing in this regard.