didip commented 5 years ago

I intermittently got this error when load testing uvicorn endpoint.

This error comes from a proxy endpoint where I am also using encode/http3 to perform HTTP client calls.

  File "/project/venv/lib/python3.7/site-packages/http3/client.py", line 365, in post
    timeout=timeout,
  File "/project/venv/lib/python3.7/site-packages/http3/client.py", line 497, in request
    timeout=timeout,
  File "/project/venv/lib/python3.7/site-packages/http3/client.py", line 112, in send
    allow_redirects=allow_redirects,
  File "/project/venv/lib/python3.7/site-packages/http3/client.py", line 145, in send_handling_redirects
    request, verify=verify, cert=cert, timeout=timeout
  File "/project/venv/lib/python3.7/site-packages/http3/dispatch/connection_pool.py", line 121, in send
    raise exc
  File "/project/venv/lib/python3.7/site-packages/http3/dispatch/connection_pool.py", line 116, in send
    request, verify=verify, cert=cert, timeout=timeout
  File "/project/venv/lib/python3.7/site-packages/http3/dispatch/connection.py", line 59, in send
    response = await self.h11_connection.send(request, timeout=timeout)
  File "/project/venv/lib/python3.7/site-packages/http3/dispatch/http11.py", line 65, in send
    event = await self._receive_event(timeout)
  File "/project/venv/lib/python3.7/site-packages/http3/dispatch/http11.py", line 109, in _receive_event
    event = self.h11_state.next_event()
  File "/project/venv/lib/python3.7/site-packages/h11/_connection.py", line 439, in next_event
    exc._reraise_as_remote_protocol_error()
  File "/project/venv/lib/python3.7/site-packages/h11/_util.py", line 72, in _reraise_as_remote_protocol_error
    raise self
  File "/project/venv/lib/python3.7/site-packages/h11/_connection.py", line 422, in next_event
    self._process_event(self.their_role, event)
  File "/project/venv/lib/python3.7/site-packages/h11/_connection.py", line 238, in _process_event
    self._cstate.process_event(role, type(event), server_switch_event)
  File "/project/venv/lib/python3.7/site-packages/h11/_state.py", line 238, in process_event
    self._fire_event_triggered_transitions(role, event_type)
  File "/project/venv/lib/python3.7/site-packages/h11/_state.py", line 253, in _fire_event_triggered_transitions
    .format(event_type.__name__, role, self.states[role]))
h11._util.RemoteProtocolError: can't handle event type ConnectionClosed when role=SERVER and state=SEND_RESPONSE

didip commented 5 years ago

I think this is caused by the client code in http3.

tomchristie commented 5 years ago

Yup - the traceback is from http3 there - have moved this issue across.

lch277 commented 5 years ago

This problem still exists in the lastest version 0.6.7

ERROR: Exception in ASGI application
Traceback (most recent call last):
  File "C:\Users\lch\.virtualenvs\authcenter-5r0y-06W\lib\site-packages\uvicorn\protocols\http\h11_impl.py", line 370, in run_asgi
    result = await app(self.scope, self.receive, self.send)
  File "C:\Users\lch\.virtualenvs\authcenter-5r0y-06W\lib\site-packages\starlette\applications.py", line 133, in __call__
    await self.error_middleware(scope, receive, send)
  File "C:\Users\lch\.virtualenvs\authcenter-5r0y-06W\lib\site-packages\starlette\middleware\errors.py", line 122, in __call__
    raise exc from None
  File "C:\Users\lch\.virtualenvs\authcenter-5r0y-06W\lib\site-packages\starlette\middleware\errors.py", line 100, in __call__
    await self.app(scope, receive, _send)
  File "C:\Users\lch\.virtualenvs\authcenter-5r0y-06W\lib\site-packages\starlette\middleware\base.py", line 25, in __call__
    response = await self.dispatch_func(request, self.call_next)
  File "subapp.py", line 56, in sso_middleware
    'code': auth_code
  File "C:\Users\lch\.virtualenvs\authcenter-5r0y-06W\lib\site-packages\http3\client.py", line 406, in post
    timeout=timeout,
  File "C:\Users\lch\.virtualenvs\authcenter-5r0y-06W\lib\site-packages\http3\client.py", line 548, in request
    timeout=timeout,
  File "C:\Users\lch\.virtualenvs\authcenter-5r0y-06W\lib\site-packages\http3\client.py", line 145, in send
    allow_redirects=allow_redirects,
  File "C:\Users\lch\.virtualenvs\authcenter-5r0y-06W\lib\site-packages\http3\client.py", line 178, in send_handling_redirects
    request, verify=verify, cert=cert, timeout=timeout
  File "C:\Users\lch\.virtualenvs\authcenter-5r0y-06W\lib\site-packages\http3\dispatch\connection_pool.py", line 130, in send
    raise exc
  File "C:\Users\lch\.virtualenvs\authcenter-5r0y-06W\lib\site-packages\http3\dispatch\connection_pool.py", line 121, in send
    request, verify=verify, cert=cert, timeout=timeout
  File "C:\Users\lch\.virtualenvs\authcenter-5r0y-06W\lib\site-packages\http3\dispatch\connection.py", line 59, in send
    response = await self.h11_connection.send(request, timeout=timeout)
  File "C:\Users\lch\.virtualenvs\authcenter-5r0y-06W\lib\site-packages\http3\dispatch\http11.py", line 58, in send
    http_version, status_code, headers = await self._receive_response(timeout)
  File "C:\Users\lch\.virtualenvs\authcenter-5r0y-06W\lib\site-packages\http3\dispatch\http11.py", line 130, in _receive_response
    event = await self._receive_event(timeout)
  File "C:\Users\lch\.virtualenvs\authcenter-5r0y-06W\lib\site-packages\http3\dispatch\http11.py", line 161, in _receive_event
    event = self.h11_state.next_event()
  File "C:\Users\lch\.virtualenvs\authcenter-5r0y-06W\lib\site-packages\h11\_connection.py", line 439, in next_event
    exc._reraise_as_remote_protocol_error()
  File "C:\Users\lch\.virtualenvs\authcenter-5r0y-06W\lib\site-packages\h11\_util.py", line 72, in _reraise_as_remote_protocol_error
    raise self
  File "C:\Users\lch\.virtualenvs\authcenter-5r0y-06W\lib\site-packages\h11\_connection.py", line 422, in next_event
    self._process_event(self.their_role, event)
  File "C:\Users\lch\.virtualenvs\authcenter-5r0y-06W\lib\site-packages\h11\_connection.py", line 238, in _process_event
    self._cstate.process_event(role, type(event), server_switch_event)
  File "C:\Users\lch\.virtualenvs\authcenter-5r0y-06W\lib\site-packages\h11\_state.py", line 238, in process_event
    self._fire_event_triggered_transitions(role, event_type)
  File "C:\Users\lch\.virtualenvs\authcenter-5r0y-06W\lib\site-packages\h11\_state.py", line 253, in _fire_event_triggered_transitions
    .format(event_type.__name__, role, self.states[role]))
h11._util.RemoteProtocolError: can't handle event type ConnectionClosed when role=SERVER and state=SEND_RESPONSE

tomchristie commented 5 years ago

@lch277 Could you provide a simple reproducible example so that we can replicate the issue?

lch277 commented 5 years ago

There are two web servers. testapp.py

from fastapi import FastAPI
from starlette.requests import Request
from starlette.responses import JSONResponse
import http3

app = FastAPI()
http_client = http3.AsyncClient()

@app.middleware('http')
async def sso_middleware(request: Request, call_next):
    r = await http_client.post('http://127.0.0.1:8001')
    if r.status_code != 200:
        return JSONResponse({'ok': 0, 'data': {'status_code': r.status_code}})
    ret = r.json()
    await r.close()
    print(ret)
    response = await call_next(request)
    return response

@app.get('/')
def index(request: Request):
    return {"ok": 1, "data": "welcome to test app!"}

if __name__ == "__main__":
    import uvicorn
    uvicorn.run(app, host="127.0.0.1", port=8000)
    pass

testapp1.py

from fastapi import FastAPI
from starlette.requests import Request

app = FastAPI()

@app.get('/')
@app.post('/')
def index(request: Request):
    return {"ok": 1, "data": "welcome to test app 11111111111!"}

if __name__ == "__main__":
    import uvicorn
    uvicorn.run(app, host="127.0.0.1", port=8001)
    pass

The reproduce steps:

run the two servers
open the http://127.0.0.1:8000/ in browser
refresh the page. the response is 200 and ok.
then stop refreshing, do some other works for seconds
switch back and refresh again, the first refresh result is 500 most of the time

This happens, not every time. The logs again:

INFO: ('127.0.0.1', 14501) - "GET / HTTP/1.1" 200
{'ok': 1, 'data': 'welcome to test app 11111111111!'}
INFO: ('127.0.0.1', 14501) - "GET / HTTP/1.1" 200
INFO: ('127.0.0.1', 14553) - "GET / HTTP/1.1" 500
ERROR: Exception in ASGI application
Traceback (most recent call last):
  File "C:\Users\lch\.virtualenvs\authcenter-5r0y-06W\lib\site-packages\uvicorn\protocols\http\h11_impl.py", line 370, in run_asgi
    result = await app(self.scope, self.receive, self.send)
  File "C:\Users\lch\.virtualenvs\authcenter-5r0y-06W\lib\site-packages\starlette\applications.py", line 133, in __call__
    await self.error_middleware(scope, receive, send)
  File "C:\Users\lch\.virtualenvs\authcenter-5r0y-06W\lib\site-packages\starlette\middleware\errors.py", line 122, in __call__
    raise exc from None
  File "C:\Users\lch\.virtualenvs\authcenter-5r0y-06W\lib\site-packages\starlette\middleware\errors.py", line 100, in __call__
    await self.app(scope, receive, _send)
  File "C:\Users\lch\.virtualenvs\authcenter-5r0y-06W\lib\site-packages\starlette\middleware\base.py", line 25, in __call__
    response = await self.dispatch_func(request, self.call_next)
  File "testapp.py", line 13, in sso_middleware
    r = await http_client.post('http://127.0.0.1:8001')
  File "C:\Users\lch\.virtualenvs\authcenter-5r0y-06W\lib\site-packages\http3\client.py", line 406, in post
    timeout=timeout,
  File "C:\Users\lch\.virtualenvs\authcenter-5r0y-06W\lib\site-packages\http3\client.py", line 548, in request
    timeout=timeout,
  File "C:\Users\lch\.virtualenvs\authcenter-5r0y-06W\lib\site-packages\http3\client.py", line 145, in send
    allow_redirects=allow_redirects,
  File "C:\Users\lch\.virtualenvs\authcenter-5r0y-06W\lib\site-packages\http3\client.py", line 178, in send_handling_redirects
    request, verify=verify, cert=cert, timeout=timeout
  File "C:\Users\lch\.virtualenvs\authcenter-5r0y-06W\lib\site-packages\http3\dispatch\connection_pool.py", line 130, in send
    raise exc
  File "C:\Users\lch\.virtualenvs\authcenter-5r0y-06W\lib\site-packages\http3\dispatch\connection_pool.py", line 121, in send
    request, verify=verify, cert=cert, timeout=timeout
  File "C:\Users\lch\.virtualenvs\authcenter-5r0y-06W\lib\site-packages\http3\dispatch\connection.py", line 59, in send
    response = await self.h11_connection.send(request, timeout=timeout)
  File "C:\Users\lch\.virtualenvs\authcenter-5r0y-06W\lib\site-packages\http3\dispatch\http11.py", line 58, in send
    http_version, status_code, headers = await self._receive_response(timeout)
  File "C:\Users\lch\.virtualenvs\authcenter-5r0y-06W\lib\site-packages\http3\dispatch\http11.py", line 130, in _receive_response
    event = await self._receive_event(timeout)
  File "C:\Users\lch\.virtualenvs\authcenter-5r0y-06W\lib\site-packages\http3\dispatch\http11.py", line 161, in _receive_event
    event = self.h11_state.next_event()
  File "C:\Users\lch\.virtualenvs\authcenter-5r0y-06W\lib\site-packages\h11\_connection.py", line 439, in next_event
    exc._reraise_as_remote_protocol_error()
  File "C:\Users\lch\.virtualenvs\authcenter-5r0y-06W\lib\site-packages\h11\_util.py", line 72, in _reraise_as_remote_protocol_error
    raise self
  File "C:\Users\lch\.virtualenvs\authcenter-5r0y-06W\lib\site-packages\h11\_connection.py", line 422, in next_event
    self._process_event(self.their_role, event)
  File "C:\Users\lch\.virtualenvs\authcenter-5r0y-06W\lib\site-packages\h11\_connection.py", line 238, in _process_event
    self._cstate.process_event(role, type(event), server_switch_event)
  File "C:\Users\lch\.virtualenvs\authcenter-5r0y-06W\lib\site-packages\h11\_state.py", line 238, in process_event
    self._fire_event_triggered_transitions(role, event_type)
  File "C:\Users\lch\.virtualenvs\authcenter-5r0y-06W\lib\site-packages\h11\_state.py", line 253, in _fire_event_triggered_transitions
    .format(event_type.__name__, role, self.states[role]))
h11._util.RemoteProtocolError: can't handle event type ConnectionClosed when role=SERVER and state=SEND_RESPONSE
{'ok': 1, 'data': 'welcome to test app 11111111111!'}
INFO: ('127.0.0.1', 14570) - "GET / HTTP/1.1" 200

Sorry for my poor English.

tomchristie commented 5 years ago

There's a really useful thing for a contirbutor to help progress here.

Can someone else reproduce this?
What's the simplest possible case you can reduce it down too? (Eg. Can you remove FastAPI from the equation, and replicate against either a plain WSGI or ASGI service)

yeraydiazdiaz commented 5 years ago

I can reproduce it. I've narrowed it down to this "caller" script:

import asyncio
import sys

import httpx

http_client = httpx.AsyncClient()

async def request(port):
    print("Performing request")
    resp = await http_client.get(f"http://localhost:{port}")
    if resp.status_code != 200:
        raise Exception("Unexpected non-200 response")
    print("Got response", resp.content, "\n")
    await resp.close()

async def main(port, timeout=5):
    await request(port)
    await asyncio.sleep(timeout)
    await request(port)

if __name__ == "__main__":
    if len(sys.argv) < 2:
        print("Please specify a port number to connect to")
    else:
        port = sys.argv[1]
        asyncio.run(main(port))

It is not reproducible against a simple Flask WSGI server:

from flask import Flask

app = Flask(__name__)

@app.route("/", methods=["GET", "POST"])
def index():
    return {"ok": 1, "data": "welcome to test app 11111111111!"}

if __name__ == "__main__":
    app.run(port=5000)

The script outputs:

Performing request
Got response b'{"data":"welcome to test app 11111111111!","ok":1}\n'

Performing request
Got response b'{"data":"welcome to test app 11111111111!","ok":1}\n

But it fails with ASGI servers, interestingly it fails for both FastAPI+Uvicorn and Quart+Hypercorn:

from fastapi import FastAPI

app = FastAPI()

@app.get("/")
@app.post("/")
def index(request):
    return {"ok": 1, "data": "welcome to test app 11111111111!"}

if __name__ == "__main__":
    import uvicorn
    uvicorn.run(app, host="127.0.0.1", port=8001)

The Quart code:

from quart import Quart

app = Quart(__name__)

@app.route("/", methods=["GET", "POST"])
async def hello():
    return "Hello, World!"

if __name__ == "__main__":
    app.run(port=5001)

For both implementations the script's output is:

Performing request
Got response b'Hello, World!'

Performing request
Traceback (most recent call last):
  File "/Users/yeray/.pyenv/versions/httpx/lib/python3.7/site-packages/h11/_state.py", line 249, in _fire_event_triggered_transitions
    new_state = EVENT_TRIGGERED_TRANSITIONS[role][state][event_type]
KeyError: <class 'h11._events.ConnectionClosed'>

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "client_script.py", line 29, in <module>
    asyncio.run(main(port))
  File "/Users/yeray/.pyenv/versions/3.7.3/lib/python3.7/asyncio/runners.py", line 43, in run
    return loop.run_until_complete(main)
  File "/Users/yeray/.pyenv/versions/3.7.3/lib/python3.7/asyncio/base_events.py", line 584, in run_until_complete
    return future.result()
  File "client_script.py", line 21, in main
    await request(port)
  File "client_script.py", line 11, in request
    resp = await http_client.get(f"http://localhost:{port}")
  File "/Users/yeray/code/personal/_forks/httpx/httpx/client.py", line 316, in get
    timeout=timeout,
  File "/Users/yeray/code/personal/_forks/httpx/httpx/client.py", line 548, in request
    timeout=timeout,
  File "/Users/yeray/code/personal/_forks/httpx/httpx/client.py", line 144, in send
    allow_redirects=allow_redirects,
  File "/Users/yeray/code/personal/_forks/httpx/httpx/client.py", line 177, in send_handling_redirects
    request, verify=verify, cert=cert, timeout=timeout
  File "/Users/yeray/code/personal/_forks/httpx/httpx/dispatch/connection_pool.py", line 128, in send
    raise exc
  File "/Users/yeray/code/personal/_forks/httpx/httpx/dispatch/connection_pool.py", line 119, in send
    request, verify=verify, cert=cert, timeout=timeout
  File "/Users/yeray/code/personal/_forks/httpx/httpx/dispatch/connection.py", line 54, in send
    response = await self.h11_connection.send(request, timeout=timeout)
  File "/Users/yeray/code/personal/_forks/httpx/httpx/dispatch/http11.py", line 58, in send
    http_version, status_code, headers = await self._receive_response(timeout)
  File "/Users/yeray/code/personal/_forks/httpx/httpx/dispatch/http11.py", line 130, in _receive_response
    event = await self._receive_event(timeout)
  File "/Users/yeray/code/personal/_forks/httpx/httpx/dispatch/http11.py", line 161, in _receive_event
    event = self.h11_state.next_event()
  File "/Users/yeray/.pyenv/versions/httpx/lib/python3.7/site-packages/h11/_connection.py", line 439, in next_event
    exc._reraise_as_remote_protocol_error()
  File "/Users/yeray/.pyenv/versions/httpx/lib/python3.7/site-packages/h11/_util.py", line 72, in _reraise_as_remote_protocol_error
    raise self
  File "/Users/yeray/.pyenv/versions/httpx/lib/python3.7/site-packages/h11/_connection.py", line 422, in next_event
    self._process_event(self.their_role, event)
  File "/Users/yeray/.pyenv/versions/httpx/lib/python3.7/site-packages/h11/_connection.py", line 238, in _process_event
    self._cstate.process_event(role, type(event), server_switch_event)
  File "/Users/yeray/.pyenv/versions/httpx/lib/python3.7/site-packages/h11/_state.py", line 238, in process_event
    self._fire_event_triggered_transitions(role, event_type)
  File "/Users/yeray/.pyenv/versions/httpx/lib/python3.7/site-packages/h11/_state.py", line 253, in _fire_event_triggered_transitions
    .format(event_type.__name__, role, self.states[role]))
h11._util.RemoteProtocolError: can't handle event type ConnectionClosed when role=SERVER and state=SEND_RESPONSE

yeraydiazdiaz commented 5 years ago

Relevant snippet from the h11 docs:

Connection.next_event Raises: RemoteProtocolError – The peer has misbehaved. You should close the connection (possibly after sending some kind of 4xx response)

tomchristie commented 5 years ago

Okay, so it's related to an issue with keep-alive connections on HTTP/1.1.

I can replicate fine with uwsgi, but not gunicorn. (Difference being that gunicorn doesn't itself support keep-alive, but instead relies on being proxied behing nginx in order to support persistent connections)

didip commented 5 years ago

I believe you can turn on gunicorn keepalive setting with —keepalive N

yeraydiazdiaz commented 5 years ago

Did some more debugging, I'm a bit out of my depth on this lower level details so bear with me 😅

During keep-alive situations the asyncio StreamReader and StreamWriter objects part of the HTTP11Connection are reused. I noticed once the reader reads the first response its at_eof is returning True. When trying to read the second response it returns an empty bytes which is interpreted by h11 as the server has closed the connection instead of sending the response.

The StreamReader.read docs mention:

If EOF was received and the internal buffer is empty, return an empty bytes object.

I would've thought that the internal buffer of the reader would not be empty and would hold the new response data but it returns the empty bytes.

I find unlikely that both FastAPI and Quart are misbehaving in the exact same way and not sending response data, so is it possible we need to "refresh" the reader somehow after it reading EOF? I can't see any obvious API to do such a thing, aside from creating a whole new connection which would obviously defeat the purpose.

Sorry if it's a silly question, but I can't really find any other reason for this error, which btw happens irregardless of the waiting in the caller scripts, two consecutive keep-alive requests will trigger it.

sethmlarson commented 5 years ago

Or we need to not feed an empty string into the h11 state machine. Maybe instead if we get an EOF back from StreamReader we don't forward it to h11?

See this comment: https://github.com/python-hyper/h11/blob/master/h11/_connection.py#L320

yeraydiazdiaz commented 5 years ago

I had a look at the asyncio code looks to me like there's no real way of reusing the reader? 😕

Once the first EOF is read the protocol's eof_received method sets the reader's EOF flag which prevents further reading from it.

I thought I'd look into how aiohttp might be handling this and seems they use custom transport and protocol instances. The latter seems to handle EOF differently but I am really out of my depth on that code so I might be completely wrong.

rabbithole

pgjones commented 5 years ago

When I run this against the latest Hypercorn using @yeraydiazdiaz example it works, with this output,

Performing request
Got response b'/Users/pgjones/quart/quart-benchmark/servers/quart_server.py-fib(12)=144' 

Performing request
Got response b'/Users/pgjones/quart/quart-benchmark/servers/quart_server.py-fib(12)=144'

crucially though I changed the time between the requests to 2 seconds as the hypercorn keep alive timeout default is 5 seconds. I think this is why it failed for Hypercorn and Uvicorn as both have a 5 second keep alive timeout.

This is what Hypercorn receives (>) and sends (<),

> b'GET / HTTP/1.1\r\nhost: localhost:8000\r\nuser-agent: python-httpx/0.6.7\r\naccept: */*\r\naccept-encoding: gzip, deflate\r\nconnection: keep-alive\r\n\r\n'
< b'HTTP/1.1 200 \r\ncontent-type: text/html; charset=utf-8\r\ncontent-length: 13\r\ndate: Wed, 24 Jul 2019 21:22:21 GMT\r\nserver: hypercorn-h11\r\n\r\n'
< b'Hello, World!'
< b''
> b'GET / HTTP/1.1\r\nhost: localhost:8000\r\nuser-agent: python-httpx/0.6.7\r\naccept: */*\r\naccept-encoding: gzip, deflate\r\nconnection: keep-alive\r\n\r\n'
< b'HTTP/1.1 200 \r\ncontent-type: text/html; charset=utf-8\r\ncontent-length: 13\r\ndate: Wed, 24 Jul 2019 21:22:23 GMT\r\nserver: hypercorn-h11\r\n\r\n'
< b'Hello, World!'
< b''
> b''

I think this issue is that httpx should convert the error raised by h11 to something like ConnectionClosedByRemote.

yeraydiazdiaz commented 5 years ago

Right, so if understand correctly, the issue is handling the server closing keep-alive connections after a timeout.

My initial approach would be to have httpx handle the exception, reestablish the connection and retry, I'll see if I can put together a PR for this.

Thanks for looking into it @pgjones 🌟

tomchristie commented 5 years ago

So the thing that's not clear to me on first sight, is why the disconnect isn't being caught at the point we start writing the next response... https://github.com/encode/httpx/blob/master/httpx/dispatch/http11.py#L50

We've got the logic there to handle the disconnect case, and we're writing to the network and calling drain.

That actually needs to be the point at which we figure out if we're still connected or not, because if we wait until getting the response, then for non-idempotent requests it's ambiguous to us as the client if we're actually okay to resend. (Or if the server did in fact handle the request, but some intermediary has disconnected us before we started seeing the response).

So, questions:

How does urllib3 determine if the network connection is still alive, when re-acquring connections from the pool?
How does aiohttp determine if the network connection is still alive, when re-acquring connections from the pool?

/cc @sethmlarson

sethmlarson commented 5 years ago

urllib3 has a is_connection_dropped() function that doesn't tell the whole story. Have to look at httplib as well.

tomchristie commented 5 years ago

Okay, so I think the right approach would be for us to have an equivelent is_connection_dropped on the Reader class in https://github.com/encode/httpx/blob/master/httpx/concurrency.py replacing the existing logic of "suck it and see".

Does that make sense to you too, @yeraydiazdiaz? #143 is a great start - nice test case for the issue that can still be used here.

(Actually I think we'll also want to combine the Reader and Writer interfaces into a single SocketConnection, but that's a different story.)

tomchristie commented 5 years ago

Closed via #145 Released to PyPI as 0.6.8

lch277 commented 5 years ago

Unfortunately, I get the same exception again in 0.6.8.

I use httpx in an oauth2 authorize flow. When I get authorization code from the auth server, I use httpx client to get an access token. After the access toke expired, the first invocation to obtain a new access token is almost always failed, the second invocation(refresh the page) is ok. The situation is similar to last time.

Some facts:

The httpx client is a long-live single instance
The invocation interval is long because it depends on the access token expire time(2 hours) and only a few users use it

Logs:

  File "/root/.local/share/virtualenvs/authcenter-zRBZbntZ/lib/python3.7/site-packages/httpx/client.py", line 687, in get
    timeout=timeout,
  File "/root/.local/share/virtualenvs/authcenter-zRBZbntZ/lib/python3.7/site-packages/httpx/client.py", line 634, in request
    async_response = concurrency_backend.run(coroutine, *args, **kwargs)
  File "/root/.local/share/virtualenvs/authcenter-zRBZbntZ/lib/python3.7/site-packages/httpx/concurrency.py", line 243, in run
    return self.loop.run_until_complete(coroutine(*args, **kwargs))
  File "uvloop/loop.pyx", line 1451, in uvloop.loop.Loop.run_until_complete
  File "/root/.local/share/virtualenvs/authcenter-zRBZbntZ/lib/python3.7/site-packages/httpx/client.py", line 144, in send
    allow_redirects=allow_redirects,
  File "/root/.local/share/virtualenvs/authcenter-zRBZbntZ/lib/python3.7/site-packages/httpx/client.py", line 177, in send_handling_redirects
    request, verify=verify, cert=cert, timeout=timeout
  File "/root/.local/share/virtualenvs/authcenter-zRBZbntZ/lib/python3.7/site-packages/httpx/dispatch/connection_pool.py", line 118, in send
    raise exc
  File "/root/.local/share/virtualenvs/authcenter-zRBZbntZ/lib/python3.7/site-packages/httpx/dispatch/connection_pool.py", line 113, in send
    request, verify=verify, cert=cert, timeout=timeout
  File "/root/.local/share/virtualenvs/authcenter-zRBZbntZ/lib/python3.7/site-packages/httpx/dispatch/connection.py", line 54, in send
    response = await self.h11_connection.send(request, timeout=timeout)
  File "/root/.local/share/virtualenvs/authcenter-zRBZbntZ/lib/python3.7/site-packages/httpx/dispatch/http11.py", line 52, in send
    http_version, status_code, headers = await self._receive_response(timeout)
  File "/root/.local/share/virtualenvs/authcenter-zRBZbntZ/lib/python3.7/site-packages/httpx/dispatch/http11.py", line 124, in _receive_response
    event = await self._receive_event(timeout)
  File "/root/.local/share/virtualenvs/authcenter-zRBZbntZ/lib/python3.7/site-packages/httpx/dispatch/http11.py", line 155, in _receive_event
    event = self.h11_state.next_event()
  File "/root/.local/share/virtualenvs/authcenter-zRBZbntZ/lib/python3.7/site-packages/h11/_connection.py", line 439, in next_event
    exc._reraise_as_remote_protocol_error()
  File "/root/.local/share/virtualenvs/authcenter-zRBZbntZ/lib/python3.7/site-packages/h11/_util.py", line 72, in _reraise_as_remote_protocol_error
    raise self
  File "/root/.local/share/virtualenvs/authcenter-zRBZbntZ/lib/python3.7/site-packages/h11/_connection.py", line 422, in next_event
    self._process_event(self.their_role, event)
  File "/root/.local/share/virtualenvs/authcenter-zRBZbntZ/lib/python3.7/site-packages/h11/_connection.py", line 238, in _process_event
    self._cstate.process_event(role, type(event), server_switch_event)
  File "/root/.local/share/virtualenvs/authcenter-zRBZbntZ/lib/python3.7/site-packages/h11/_state.py", line 238, in process_event
    self._fire_event_triggered_transitions(role, event_type)
  File "/root/.local/share/virtualenvs/authcenter-zRBZbntZ/lib/python3.7/site-packages/h11/_state.py", line 253, in _fire_event_triggered_transitions
    .format(event_type.__name__, role, self.states[role]))
h11._util.RemoteProtocolError: can't handle event type ConnectionClosed when role=SERVER and state=SEND_RESPONSE

sethmlarson commented 5 years ago

Thanks for reporting this issue @lch277, I'm going to reopen this issue since it seems similar.

This would be a great time to have had our debug logging in place so we can exactly track down the sequence of events in your request. Maybe we should speed up the integration of those logs to help solve issues like this.

iwoloschin commented 5 years ago

I also had this error with httpx 0.7.2 while communicating with a FastAPI based application running straight on Uvicorn (no gunicorn). This specific issue was against a brand new endpoint I just added in, but other similar endpoints have worked fine with requests for quite some time now. I have not yet had a chance to dig into logs, will do that later today and try to post back something more useful here.

A quick check reveals my FastAPI server appears to have crashed, of course without any sort of decent logging, at the same time this request was made. I'm tentatively pointing my blaming finger at my server application crashing mid-request.

sethmlarson commented 5 years ago

@iwoloschin Yes but we should be able to handle that sort of issue gracefully from the HTTPX side. You shouldn't be getting an h11 exception.

iwoloschin commented 5 years ago

@sethmlarson I won't argue about that :). I'd much prefer only having to think about HTTPX exceptions when using HTTPX!

I've added some logging and kicked off another test. It's not a fast test and the issue appears to have been intermittent, so I might not have any results for a while and if it's truly intermittent I may never get results (but hey, no errors is good too, right...?).

sethmlarson commented 5 years ago

@iwoloschin Are you aware of HTTPX_DEBUG=1 for debugging high-level events happening within HTTPX? :)

iwoloschin commented 5 years ago

Nope! I'll take a look at that.

My re-test failed with a httpx.exceptions.ReadTimeout on a different endpoint that should take less than a second. I'm leaning towards my problem with the h11 exception being my server crashing. I'll still second a vote for encapsulating exceptions to only throw HTTPX exceptions.

tomchristie commented 5 years ago

I'll still second a vote for encapsulating exceptions to only throw HTTPX exceptions.

Agreed, yup.

florimondmanca commented 5 years ago

So, is this still an issue? And if so, what can we do to resolve it?

Should we introduce a sort of « catch all » wrapper exception? (That doesn’t seem easily feasible to me without introducing cruft.)
Perhaps we could build up a small reproduction example in which we’d forcibly crash or close the remote server and see what happens, ie whether we get the h11 exception again?

iwoloschin commented 5 years ago

I just ran into this issue again.

In my case, I sort of expect something stupid to happen, I'm deliberately GETing an endpoint on a server that is attempting to reboot, in an attempt to take an action after the reboot is complete & the HTTP service (FastAPI) is back up. I'm fine to catch some exceptions here (HTTPError or ConnectionRefusedError), it kind of makes sense to throw an exception if you can't reach the server, but I think catching an h11 exception here is confusing, especially since I don't normally have to touch h11, httpx manages all of that for me (and it's lovely when it works!) Without really being knowledgable about the internals, I'd put in a strong vote for httpx raising an HTTPError-based exception here. If someone wants to point me in the right direction I can try to help with that...but I know very little about how httpx or h11 work internally so it wouldn't be fast 😄 .

I haven't dug into it yet, but the timing appears to coincide with when the HTTP Service is stopped. I might be able to come up with a small, reproducible example but it might take a few days.

iwoloschin commented 5 years ago

Quick addendum, if I put a short sleep in, to try and avoid hitting the endpoint as the HTTP service is being stopped it appears to avoid the h11 exception. I then got OSError: [Errno 113] Host is unreachable, which...is correct, but also not exactly intuitive.

tomchristie commented 5 years ago

Should we introduce a sort of « catch all » wrapper exception? (That doesn’t seem easily feasible to me without introducing cruft.)

This one doesn't look like it ought to be a catch-all case. Seems to be anexpected possible behaviour that we ought to explicitly catch and raise as a ConnectionClosed or whatever?

Perhaps we could build up a small reproduction example in which we’d forcibly crash or close the remote server and see what happens, ie whether we get the h11 exception again?

Certainly it'd be really helpful to be able to reliable reproduce this yup.

iwoloschin commented 4 years ago

Here's a simple reproducible example. It uses two separate files and requires fastapi and httpx.

FastAPI Server:

from fastapi import FastAPI

app = FastAPI()

@app.get("/hello")
async def hello():
    return "World"

# To Run:
# gunicorn -k uvicorn.workers.UvicornWorker -b 0.0.0.0:80  test:app

Client:

#!/usr/bin/env python3

import httpx

def main():
    try:
        while True:
            r = httpx.get("http://localhost/hello")
            print(r.text)
    except KeyboardInterrupt:
        print("done")

if __name__ == "__main__":
    main()

In one terminal run the server (I'm using gunicorn with a uvicorn worker since that's how I normally run my FastAPI applications).

In a second terminal, launch the test (I just did ./test.py). Let it run for a second and then kill the server with a CTRL-C in the first terminal. The test client, in the second terminal, should die with the expected h11._util.RemoteProtocolError exception. In a quick test (~5 tries) I got the expected error 4 times and a ConnectionRefused error once, which kind of makes sense since it appears to be a race condition of whether or not the server finishes sending the response before it dies.

For what it's worth I'm running on macOS (Mojave) with Homebrew Python3.7 in a virtual environment with the latest httpx and fastapi.

florimondmanca commented 4 years ago

Thanks @iwoloschin, that's very helpful. I was able to reproduce using the even simpler setup below (plain ASGI app + uvicorn):

# app.py
async def app(scope, receive, send):
    assert scope["type"] == "http"
    await send(
        {
            "type": "http.response.start",
            "status": 200,
            "headers": [[b"content-type", b"text/plain"]],
        }
    )
    await send({"type": "http.response.body", "body": b"Hello, world!"})

# uvicorn app:app

I received the h11 error 3 times out of 5, the 2 other failures were an OSError. I'm running the same setup as you (except for Python 3.7 via pyenv).

Another piece of info: I get the same result whether I use the top-level API level, a Client, or an AsyncClient.

tomchristie commented 4 years ago

Good digging folks!

gvbgduh commented 4 years ago

@florimondmanca I'm also getting this one, but quite reliably with ssl. Works well without ssl. I'm also using a simple ASGI app + uvicorn + h11 in the AsyncClient.

gvbgduh commented 4 years ago

import asyncio
import pytest

from httpx import AsyncClient

from uvicorn.config import Config
from uvicorn.main import Server

global_call_count = 0

class App:
    def __init__(self, scope):
        self.scope = scope
        self.call_count = 0

    async def __call__(self, receive, send):
        global global_call_count

        self.call_count += 1
        global_call_count += 1

        await send({"type": "http.response.start", "status": 200, "headers": []})
        await send({"type": "http.response.body", "body": bytes(f"That's the body! - {self.call_count} - {global_call_count}", "utf-8"), "more_body": False})

class CustomServer(Server):
    pass

@pytest.mark.asyncio
async def test_baseline_h11(certfile_and_keyfile):
    certfile, keyfile = certfile_and_keyfile

    client = AsyncClient(http_versions=["HTTP/1.1"])
    config = Config(
        app=App,
        loop="asyncio",
        limit_max_requests=3,
        http='h11',
        # ssl_certfile=certfile,
        # ssl_keyfile=keyfile
    )
    server = CustomServer(config=config)

    # Prepare the coroutine to serve the request
    run_request = server.serve()
    # Prepare the coroutine to make the request
    make_request = client.get("http://127.0.0.1:8000")

    # Reset the global counter
    global global_call_count
    global_call_count = 0

    # Run coroutines
    results = await asyncio.gather(*[
        run_request,
        client.get("http://127.0.0.1:8000"),
        client.get("http://127.0.0.1:8000"),
        client.get("http://127.0.0.1:8000"),
    ])

    assert [x.text for x in results if x] == ["That's the body! - 1 - 2", "That's the body! - 1 - 3", "That's the body! - 1 - 4"]

gvbgduh commented 4 years ago

Actually, my test case has a mistake, it tries to communicate over HTTP instead of HTTPS. And fails in this case

13:39:34.627 - httpx.dispatch.connection - connected http_version='HTTP/1.1'
DEBUG:    connected http_version='HTTP/1.1'
13:39:34.627 - httpx.dispatch.http11 - send_headers method='GET' target='/' headers=Headers({'host': '127.0.0.1:8000', 'user-agent': 'python-httpx/0.7.6', 'accept': '*/*', 'accept-encoding': 'gzip, deflate', 'connection': 'keep-alive'})
DEBUG:    send_headers method='GET' target='/' headers=Headers({'host': '127.0.0.1:8000', 'user-agent': 'python-httpx/0.7.6', 'accept': '*/*', 'accept-encoding': 'gzip, deflate', 'connection': 'keep-alive'})
13:39:34.628 - httpx.dispatch.http11 - receive_event event=NEED_DATA
DEBUG:    receive_event event=NEED_DATA

...

    def _fire_event_triggered_transitions(self, role, event_type):
        state = self.states[role]
        try:
>           new_state = EVENT_TRIGGERED_TRANSITIONS[role][state][event_type]
E           KeyError: <class 'h11._events.ConnectionClosed'>

venv/lib/python3.7/site-packages/h11/_state.py:249: KeyError

...

self = <h11._state.ConnectionState object at 0x103f8b450>, role = SERVER, event_type = <class 'h11._events.ConnectionClosed'>

    def _fire_event_triggered_transitions(self, role, event_type):
        state = self.states[role]
        try:
            new_state = EVENT_TRIGGERED_TRANSITIONS[role][state][event_type]
        except KeyError:
            raise LocalProtocolError(
                "can't handle event type {} when role={} and state={}".format(
>                   event_type.__name__, role, self.states[role]
                )
            )
E           h11._util.RemoteProtocolError: can't handle event type ConnectionClosed when role=SERVER and state=SEND_RESPONSE

venv/lib/python3.7/site-packages/h11/_state.py:253: RemoteProtocolError

I guess it's just another facet of the problem

toppk commented 4 years ago

just hit this issue (triggered by killing the server during long polling), and this is my stab at pondering the stack trace I'm seeing.

h11 cannot process event given the current state: " h11._util.RemoteProtocolError: can't handle event type ConnectionClosed when role=SERVER and state=SEND_RESPONSE"
h11 docs say " If either side violates the protocol, you'll get a h11.ProtocolError exception."

So we have two choices:

detect and circumvent the connectionclosed event from getting to h11 causing it to blow up
decode the h11.ProtocolError and throw a more appropriate event

The issue with #2 is that h11 doesn't put details in their exception outside of a string, so either we're fixing that in h11 (adding more structured detailed inside the exception), or we're parsing the msg from the exception.

Thoughts? I'm going to see if there's a way to prevent the ConnectionClose event blowing up h11 (choice number #1).

toppk commented 4 years ago

So I've done some further analysis, and it seems that h11 doesn't give us an opportunity for a detailed exception. The documentation was hopeful (see h11's apidocs for Connection.next_event ) but neither their_state or error_status_hint allows us to create anything but a generic error. I could whip up a pull request based on this and some of the test case listed in earlier comments. Let me know.

index b1781bf..926560f 100644
--- a/httpx/dispatch/http11.py
+++ b/httpx/dispatch/http11.py
@@ -6,6 +6,7 @@ from ..concurrency.base import BaseSocketStream, ConcurrencyBackend, TimeoutFlag
 from ..config import TimeoutConfig, TimeoutTypes
 from ..models import AsyncRequest, AsyncResponse
 from ..utils import get_logger
+from ..exceptions import ProtocolError, ConnectionClosed

 H11Event = typing.Union[
     h11.Request,
@@ -161,7 +162,13 @@ class HTTP11Connection:
         Read a single `h11` event, reading more data from the network if needed.
         """
         while True:
-            event = self.h11_state.next_event()
+            try:
+                event = self.h11_state.next_event()
+            except h11.RemoteProtocolError as e:
+                logger.debug(f"h11.RemoteProtocolError exception their_state={self.h11_state.their_state} error_status_hint={e.error_status_hint}")
+                if self.stream.is_connection_closed():
+                    raise ConnectionClosed(e)
+                raise ProtocolError(e)

             if isinstance(event, h11.Data):
                 logger.trace(f"receive_event event=Data(<{len(event.data)} bytes>)")
diff --git a/httpx/exceptions.py b/httpx/exceptions.py
index 81df38b..01f4f87 100644
--- a/httpx/exceptions.py
+++ b/httpx/exceptions.py
@@ -149,6 +149,10 @@ class InvalidURL(HTTPError):
     URL was missing a hostname, or was not one of HTTP/HTTPS.
     """

+class ConnectionClosed(HTTPError):
+    """
+    Expected more data from peer, but connection was closed.
+    """

 class CookieConflict(HTTPError):
     """

njsmith commented 4 years ago

h11 could give more detailed exceptions if that's useful (or just clearer error messages), but it's not clear to me what the actual problem you're trying to solve is. You have a server that's closing the connection before it finishes sending the response. Do you want to treat that differently from other kinds of server misbehavior? What information do you actually need here?

toppk commented 4 years ago

h11 could give more detailed exceptions if that's useful (or just clearer error messages), but it's not clear to me what the actual problem you're trying to solve is. You have a server that's closing the connection before it finishes sending the response. Do you want to treat that differently from other kinds of server misbehavior? What information do you actually need here?

It is common in http client libraries to give specific exceptions for transport layer issues, like readtimeout, connectionrefused. This dropped connection indicates a different class of issue than a typical protocol violation (and the user may want to take a different tactic in handling it, e.g. switch to a different server within a cluster).

However, from httpx's perspective, I believe it would react the same either way (e.g. it cannot attempt to retry on it's own). I think the only value in making a distinct exception for this fault case is to make it easier for the httpx user to understand and handle the the failure. So I think h11's behavior is fine as is.

I hope my analysis has been helpful, but I think @tomchristie should weigh in now.

tomchristie commented 4 years ago

@toppk - Your diff in https://github.com/encode/httpx/issues/96#issuecomment-552730324 looks about right to me. We just need to be wrapping this case up properly and exposing it as a ConnectionClosed exception.

I think we should probably be checking for the event type ConnectionClosed, rather than inspecting if self.stream.is_connection_closed(): to determine if we're raising a protocol error, or a connection closed exception type.

toppk commented 4 years ago

@toppk - Your diff in #96 (comment) looks about right to me. We just need to be wrapping this case up properly and exposing it as a ConnectionClosed exception.

I think we should probably be checking for the event type ConnectionClosed, rather than inspecting if self.stream.is_connection_closed(): to determine if we're raising a protocol error, or a connection closed exception type.

By design of h11, the ConnectionClosed isn't expected, there is no event type 'ConnectionClosed' exposed back to the caller, only a RemoteProtocolError exception. Or said another way, h11 only exposes ConnectionClosed event when the protocol expects the connection to be closed.

\ The current model of h11 isn't to process unexpected events ( review h11._state for the design), but just to raise an exception. The little exception characterization that exists in occurs in Connection.next_state, and simple sets a recommend http error code (in an attribute named error_status_hint, which in our failure mode is left at the generic 400) \</verbose details>

Although if self.stream.is_connection_closed() is just a heuristic, but it seems to be helpful, considering otherwise we'd just be sending up httpx.exceptions.ProtocolError.

I think if you wanted to avoid the heuristic, without redesigning h11, I think the simplest improvement h11 could make is to add the state of the protocol inside the exception created at h11._state.ConnectionState._fire_event_triggered_transitions rather than just as a string, and bubble that up into the RemoteProtocolError. Then we could examine RemoteProtocolError for the unhandled ConnectionClosed event, and infer that is the root cause.

Or we could just parse the string in the exception we get right now :)

tomchristie commented 4 years ago

Misunderstanding on my part then. Examine the connection. 👍

njsmith commented 4 years ago

Also this particular exception can only happen if you've passed b"" to h11 to tell it that the connection was closed, in case tracking that is somehow easier.

On Tue, Nov 12, 2019, 11:03 Tom Christie notifications@github.com wrote:

Misunderstanding on my part then. Examine the connection. 👍

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/encode/httpx/issues/96?email_source=notifications&email_token=AAEU42HAHPMMB63Z7OYYFQ3QTL4WXA5CNFSM4HZGWBRKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOED3QWCY#issuecomment-553061131, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAEU42EYGW5TNGGPZPHEDOLQTL4WXANCNFSM4HZGWBRA .

worldveil commented 4 years ago

Currently experiencing this issue with an aiohttp (using httpx) server reverse proxied by nginx. It's intermittent, but in a load test (and not a big one), I roughly get this distribution of status codes:

Status code distribution:
  [200] 12 responses
  [500] 23 responses

in other words, roughly 2/3 are failing due to the:

h11._util.RemoteProtocolError: can't handle event type ConnectionClosed when role=SERVER and state=SEND_RESPONSE

error issue. I had been banking on httpx to handle my async HTTP requests.

Since this issue has been open for months, are there any quick fixes such as a way to close all HTTP connections by default? or leave them open? I'm not sure what the issue is or how to set that up.

Otherwise, can someone recommend a good async HTTP python lib to use in the meantime? (I don't mean this as a snub, I genuinely have to figure out a workaround soon.).

thanks for all you folks do!!

njsmith commented 4 years ago

I don't think there are any actual behavioral bugs being discussed in this thread. It's just about how to give a better error message. So I think either there's something wrong with your server setup that's causing it to close connections abruptly without sending responses, or else you've found a different bug in httpx and should probably open a new issue so your problem doesn't get lost in the noise.

worldveil commented 4 years ago

Hm. Got it. Thanks for the heads up.

I have my load test requests coming in to my server, I then make requests to a vendor server in order to answer the original request. Upon "requesting" these vendor requests, I see this issue.

Traceback (most recent call last):
File "/usr/local/lib/python3.7/site-packages/h11/_state.py", line 249, in _fire_event_triggered_transitions
new_state = EVENT_TRIGGERED_TRANSITIONS[role][state][event_type]
KeyError: <class 'h11._events.ConnectionClosed'>

Do you have any idea where to start looking, or are there known scenarios where this error comes up?

toppk commented 4 years ago

Hm. Got it. Thanks for the heads up.

I have my load test requests coming in to my server, I then make requests to a vendor server in order to answer the original request. Upon "requesting" these vendor requests, I see this issue.
Traceback (most recent call last):
File "/usr/local/lib/python3.7/site-packages/h11/_state.py", line 249, in _fire_event_triggered_transitions
new_state = EVENT_TRIGGERED_TRANSITIONS[role][state][event_type]
KeyError: <class 'h11._events.ConnectionClosed'>
Do you have any idea where to start looking, or are there known scenarios where this error comes up?

This exception could be seen if your web server closes a tcp connection prematurely (being overloaded could certainly cause this), or if a server dies, although it could just be a bug in the server code.

When using httpx you must capture certain exceptions when legitimate failuremodes occur (like readtimeouts, connect failures, etc).

This code would look something like this.

import httpx
try: 
   httpx.get(website)
except httpx.exceptions.HTTPError as e:
    print ("got unwanted exception [%s]" % e)

Unfortunately, seeing the stacktrace is because httpx is leaking an exception type from a dependent package (h11). For the moment, you must also do something like this:

import httpx
import h11
try: 
   httpx.get(website)
except httpx.exceptions.HTTPError as e:
    print ("got unwanted exception [%s]" % e)
except h11.RemoteProtocolError as e:
    print ("got unwanted exception [%s]" % e)

You can then decide how to handle the failure mode (close connection, retry, warn user, etc.).

worldveil commented 4 years ago

Yep, that makes sense. I did some reading about the HTTP protocol today - learned a few things!

My investigations thus far have led me to believe it's some kind of rudely implemented rate limiting, so a retry w/ backoff type solution seems best.

Thanks all for the input and all the great work on httpx. It's a wonderful package.

encode / httpx

h11._util.RemoteProtocolError: can't handle event type ConnectionClosed when role=SERVER and state=SEND_RESPONSE #96

rabbithole