HTTP 102 leads ClientSession to hang until socket is closed

SantjagoCorkez commented 6 years ago

Long story short

If an HTTP server (aiohttp.web.Application in my case) responds with a HTTP status code 102 aiohttp-driven client doesn't yield until server (or client itself) closes connection and then the yielded value is aiohttp.client_exceptions.ServerDisconnectedError: RawResponseMessage(version=HttpVersion(major=1, minor=1), code=102, reason=None, headers=<CIMultiDict()>, raw_headers=(), should_close=False, compression=None, upgrade=False, chunked=False)

Expected behaviour

response yielded to the callee

Actual behaviour

Client coroutine hangs until server closes the connection which in turn leads to the exception above

Steps to reproduce

server:

from aiohttp import web

async def web_index(request: web.Request) -> web.Response:
    return web.Response(text="", status=102)

app = web.Application()
app.add_routes([web.get('/', web_index)])

web.run_app(app, port=8080)

client:

from asyncio import get_event_loop

import aiohttp

async def clt():
    async with aiohttp.ClientSession() as session:
        async with session.get('http://127.0.0.1:8080/') as response:
            response: aiohttp.ClientResponse
            status = response.status
            text = ''
            if 200 <= status < 300:
                text = await response.text()
            return status, text

get_event_loop().run_until_complete(clt())

Run the server then client

Your environment

aiohttp 3.1.2 python 3.6 Cython 0.28

asvetlov commented 6 years ago

Yes, this is the problem. Now aiohttp client handles 101 Continue only. The fix is welcome.

SantjagoCorkez commented 6 years ago

I wish I knew where to look at for the cause of the bug

asvetlov commented 6 years ago

I suspect this line: https://github.com/aio-libs/aiohttp/blob/master/aiohttp/client_reqrep.py#L703

SantjagoCorkez commented 6 years ago

What is the purpose of that line? Why that line holds on on any status except HTTP 101 Switch Protocol and 200+? Shouldn't it just skip body receiving in case any of 1xx status codes (assuming wikipedia's explanation on informational status codes is correct and there should be no body in responses of 1xx group) and return early as soon as headers are received and parsed?

hubo1016 commented 6 years ago

No, I don't think so, this should be fixed on the server side.

According to RFC, a HTTP request can have multiple responses, with only the last one be 2xx, 3xx, 4xx, or 5xx. There can be any number of 1xx responses before the final response.

https://tools.ietf.org/html/rfc7231#page-50

6.2.  Informational 1xx

   The 1xx (Informational) class of status code indicates an interim
   response for communicating connection status or request progress
   prior to completing the requested action and sending a final
   response. 1xx responses are terminated by the first empty line after
   the status-line (the empty line signaling the end of the header
   section).  Since HTTP/1.0 did not define any 1xx status codes, a
   server MUST NOT send a 1xx response to an HTTP/1.0 client.

   A client MUST be able to parse one or more 1xx responses received
   prior to a final response, even if the client does not expect one.  A
   user agent MAY ignore unexpected 1xx responses.

   A proxy MUST forward 1xx responses unless the proxy itself requested
   the generation of the 1xx response.  For example, if a proxy adds an
   "Expect: 100-continue" field when it forwards a request, then it need
   not forward the corresponding 100 (Continue) response(s).

1xx responses are very special, they cannot be the last response from the server (unless 101 Switching Protocols upgrades the protocol to another protocol like websocket or HTTP/2). Since aiohttp accepts only one return value from the handler, the server side must check the status code of the response object. If the status code is 1xx, it must raise an exception and return a 500 response instead (as if there are no responses)

HTTP status code are defined in groups: 1xx - informational, 2xx - success, 3xx - redirection, 4xx - client error, 5xx - server error. Each group has special means and might be treated differently. They cannot swap with each other freely.

hubo1016 commented 6 years ago

And also, 1xx responses cannnot have a body. 204, 304 cannot have a body too (though in some implementations they are not carefully checked)

asvetlov commented 6 years ago

@hubo1016 I don't follow what are you proposing? Do nothing? Do fix aiohttp in some manner? What fixup is required? Do you want to work on a solution? Do you like to limit your participation by RFC citation only?

SantjagoCorkez commented 6 years ago

@hubo1016 RFCs are good to be the basis when you implement exactly those conditions described within them. But if you appeal to RFCs then please give me a hint to what should one's implementation consist of when it leads to this:

a client makes a (bulk) request to a server asking it to create a long-running task/tasks (for example create a statistics report/reports for a dataset having tons of rows to process and calculate)
the server then responds with 201 Created giving back a reference ticket number to the newly created processing queue entry
then client asks periodically the server, for example HEAD /stats/report?ticket_id=1 HTTP/1.1

What should the server respond with while the ticket processing is not finished yet?

One may propose a 200 OK and a body/header describing the state of ticket, but then:

body is not acceptable while the request method was HEAD
an application-specific header is then a workaround for what should be exactly covered with 102 Processing without any headers at all (so one's application could just stop processing even headers as soon as it receives response status line)

While strict RFC compliance is questionable @asvetlov Is it possible to implement a re-entrant aiohttp.client response message so while it's getting 1xx it yields to the callee and being re-entered continues to wait for further server responses? Something like a context-aware generator which callee could stop processing at any time. I'm too new with aiohttp to propose a patch, but at least I could propose an idea.

samuelcolvin commented 6 years ago

What about an optional callback argument that lets you define what behavior you like for 10X responses?

Something like


async def my_callback(request, response):
    'return None to continue, return a response to finish the request, or just True/False'
    return response if response.status == 102 else None

async with session.get(..., informational_callback=my_callback) as r:
    ...

I agree something like

async for r in session.get(...):
    if r.status != 101:
        ...
        break

Is in some ways more elegant, but it could be very confusing for new users and I imagine would require a bigger change?

SantjagoCorkez commented 6 years ago

I think while callback-driven solution could be simple enough to implement the whole callback methodology is what asyncio is against of. But could be a temporary work-around for a while. Meanwhile I as a library user would be glad to use the library having any type of solution.

hubo1016 commented 6 years ago

@asvetlov I'm proposing that we strictly follow the RFC, and forbids the handler returns a 1xx response. If the handler returns a 1xx response, aiohttp should raise an exception and return a 500 response to the client. This prevents the indefinite block. I can work on it after we come to a conclusion.

How to add customized processing on 1xx responses - for example, allowing the server to return customized 1xx responses, and allowing the client to retrieve and process the 1xx responses - are another question. It will need some more complicated design, and I don't think it is very useful. The client side should not be very difficult to implement, but the server side may be harder.

@SantjagoCorkez for your question. Most designs choose to use normal response code (200) with a JSON body for result. Use HEAD instead of GET is not very helpful, because the response is small enough to be included in a single packet (about 1.4KB). If you only query the server less then once per second, the extra processing will not be a bottleneck, and it is quite easy to implement in any language (even bash). And the last request will directly get the final result. If you use HEAD, you will need another GET request after the HEAD request succeeded.

If you really want a status code based solution, I would suggest 304 (Not Modified), or 412 (Precondition Failed). They are compatible to most HTTP client implementations. Or you may define your own status code like 299 or 269.

an application-specific header is then a workaround for what should be exactly covered with 102 Processing without any headers at all

102 Processing is a obsoleted status code defined by RFC 2518 https://tools.ietf.org/html/rfc2518#page-59 10.1 102 Processing

The 102 (Processing) status code is an interim response used to inform the client that the server has accepted the complete request, but has not yet completed it. This status code SHOULD only be sent when the server has a reasonable expectation that the request will take significant time to complete. As guidance, if a method is taking longer than 20 seconds (a reasonable, but arbitrary value) to process the server SHOULD return a 102 (Processing) response. The server MUST send a final response after the request has been completed.

Methods can potentially take a long period of time to process, especially methods that support the Depth header. In such cases the client may time-out the connection while waiting for a response. To prevent this the server may return a 102 (Processing) status code to indicate to the client that the server is still processing the method.

RFC 2518 suggests to use it as an interim response before the final response to prevent the connection become timeout. If you are planning to use it, you will need to periodically yield a 102 response before the final 200 response, which is not supported by aiohttp now.

RFC compliance might be bigger problem than you think. If you put your server behind nginx, it might think your request is not completed, and refuse to send more requests to the connection. And other client implementations (some in other languages) cannot use your API. HTTP is a very complicated standard, and different implementations vary a lot, so usually we want to use the most common functions to avoid incompatibilities.

hubo1016 commented 6 years ago

The current behavior in client is actually a fix of #1353

justin0mcateer commented 4 years ago

Perhaps I am misunderstanding the exact nature of the bug in this case. However, the point of 102 Processing is to 'prevent the client from timing out'. It doesn't contain any information, so what is the point of returning it to the aiohttp library user? It seems to me, the client library should reset it's HTTP response timeout timer and go back to waiting for a 'final' response. In this way, you don't need to modify the library API at all. Worst case, you could have a configuration flag that declares whether you want to ignore the interim responses with regards to resetting the timer.

Dreamsorcerer commented 1 month ago

As 102 is obsolete and no longer exists in any up-to-date web standard, I don't see any changes being made to aiohttp to accommodate this.

Dreamsorcerer commented 1 month ago

One may propose a 200 OK and a body/header describing the state of ticket, but then:

* body is not acceptable while the request method was `HEAD`

* an application-specific header is then a workaround for what should be exactly covered with `102 Processing` without any headers at all (so one's application could just stop processing even headers as soon as it receives response status line)

Additionally, I think this idea misunderstood the use case. 102 can't be used as a replacement for the described problem, as it literally tells the client not to complete the request. It was also only part of the webdav specification, not a general-purpose HTTP response code.

aio-libs / aiohttp