encode / uvicorn

An ASGI web server, for Python. 🦄
https://www.uvicorn.org/
BSD 3-Clause "New" or "Revised" License
8.55k stars 745 forks source link

Sporatic errors with nginx connection pipelining #344

Closed nhumrich closed 2 years ago

nhumrich commented 5 years ago

There seems to be some weird issue with pipelining and or connection reuse and nginx. I have nginx sitting in front of my uvicorn/starlette app, and on patch/post requests I occasionally get 502's from nginx (about 30-40% of the time). The nginx logs say:

upstream prematurely closed connection while reading response header from upstream

all over the place

and my app logs say:

WARNING: Invalid HTTP request received.
WARNING: Invalid HTTP request received.

with not much more information.

I forced connection reuse to not work by adding the default header Connection: close, which forces nginx to kill the connection. Performance drops significantly, but at least I don't get 502's.

uvicorn.run(app, host='0.0.0.0', port=8445, headers=[('Server', 'custom'), ('Connection', 'close')], proxy_headers=True)

potentially relevant: the 502 doesn't seem to ever happen on a GET

tomchristie commented 5 years ago

Okay - so I'd start by trying each of the h11 and httptools implementations, and see if replicates in both cases. You could also dig into if you can replicate with hypercorn or daphne.

That'll give us a first pass onto what's going on.

Uvicorn will respond like that if it gets a malformed request. So another route into figuring out the issue would be to change the source code so that it logs out the raw byte streams that are being sent, and examine how they look in the broken cases.

nhumrich commented 5 years ago

Great idea. How do I swap between h11 and httptools?

tomchristie commented 5 years ago

So. --http h11 or --http httptools.

Def. also recommend trying to replicate on Daphne or Hypercorn following that too.

Let's narrow it down as much as poss as a first step.

nhumrich commented 5 years ago

The issue only happens with httptools. I could not reproduce with h11 on uvicorn. I could not reproduce using hypercorn. I wasn't able to get daphne working.

Hope that helps.

leobuskin commented 5 years ago

I can confirm this issue, can see such behavior from uvicorn==0.4.3 (didn't test earlier). Seems related to websockets, going to investigate deeper next week. @nhumrich are you using ws?

nhumrich commented 5 years ago

@leobuskin no, I am not using websockets at all

gvbgduh commented 5 years ago

Yeah, I see occasionally similar behaviour as well, quite rare though. It seems to be related to concurrent long running requests with some heavy CPU and memory usage in my case. Eg. csv files uploads (2-9 Mb) with background validation and writing to a db (yeah, via POST). So, if I make 4 requests at the same time the 3rd and/or 4th are likely to throw 502 back. Probably it might affect other requests, not sure.

I’ll try to reliably recreate this one and hunt it down, happy there’s an issue already. Run it behind nginx locally was a missing peace of puzzle to catch it.

gvbgduh commented 5 years ago

I was able to replicate it (sort of reliable), but under some load.

I've tried --worker-class uvicorn.workers.UvicornH11Worker and --worker-class uvicorn.workers.UvicornWorker and it appears in both cases.

On the surface, it seems that it happens for the consecutive request when the former one makes the worker timed out. Not sure if it's a big surprise, but I'll try to dig deeper.

nginx_1  | 172.21.0.1 - - [14/May/2019:15:44:31 +0000] "POST /endpoint/bb960881-5f1f-4e5c-a38d-6d893445bea8 HTTP/1.1" 200 805 "-" "python-requests/2.21.0" "-"
nginx_1  | 2019/05/14 15:44:31 [warn] 8#8: *19 a client request body is buffered to a temporary file /var/cache/nginx/client_temp/0000000010, client: 172.21.0.1, server: , request: "POST /endpoint/bb960881-5f1f-4e5c-a38d-6d893445bea8 HTTP/1.1", host: "localhost:1337"
web_1    | [2019-05-14 15:44:31 +0000] [37] [DEBUG] ('172.21.0.4', 43134) - Connected
web_1    | [2019-05-14 15:44:46 +0000] [1] [CRITICAL] WORKER TIMEOUT (pid:37)
nginx_1  | 172.21.0.1 - - [14/May/2019:15:44:46 +0000] "POST /endpoint/bb960881-5f1f-4e5c-a38d-6d893445bea8 HTTP/1.1" 502 158 "-" "python-requests/2.21.0" "-"
nginx_1  | 2019/05/14 15:44:46 [error] 8#8: *19 upstream prematurely closed connection while reading response header from upstream, client: 172.21.0.1, server: , request: "POST /endpoint/bb960881-5f1f-4e5c-a38d-6d893445bea8", upstream: "/endpoint/bb960881-5f1f-4e5c-a38d-6d893445bea8", host: "localhost:1337"
nginx_1  | 2019/05/14 15:44:46 [warn] 8#8: *21 a client request body is buffered to a temporary file /var/cache/nginx/client_temp/0000000011, client: 172.21.0.1, server: , request: "POST /endpoint/bb960881-5f1f-4e5c-a38d-6d893445bea8 HTTP/1.1", host: "localhost:1337"
web_1    | [2019-05-14 15:44:46 +0000] [53] [INFO] Booting worker with pid: 53
web_1    | [2019-05-14 15:44:47 +0000] [53] [INFO] Started server process [53]
web_1    | [2019-05-14 15:44:47 +0000] [53] [INFO] Waiting for application startup.
web_1    | [2019-05-14 15:44:47 +0000] [53] [DEBUG] ('172.21.0.4', 43138) - Connected
web_1    | [2019-05-14 15:44:47 +0000] [53] [INFO] ('172.21.0.4', 43138) - "POST /endpoint/bb960881-5f1f-4e5c-a38d-6d893445bea8 HTTP/1.0" 200
gvbgduh commented 5 years ago

All in all, I think there's no issue in uvicorn for me. Adjusting the timeout in uvicorn/gunicorn fixes it. If this timeout is greater than the nginx one it can hit 504 from it.

nhumrich commented 5 years ago

for me, the behavior isn't timeout related. I didn't have any timing out requests, and it didn't require much load to reproduce

gvbgduh commented 5 years ago

@nhumrich interesting, how would you reproduce it? Are you running it via gunicorn or just plain uvicorn?

nhumrich commented 5 years ago

@gvbgduh I tried it both ways. Both reproduce the error when I use httptools, but are fine when I use h11. Just a web server behind nginx. Basically, I have ALB -> nginx -> myservice. I haven't really spent the time to reproduce with a minimal example, but I imagine reproducing has moreso to do with nginx configuration then the actual code.

matthax commented 5 years ago

Encountering the same issue with a similar configuration, however, I also noticed in the deployment instructions there is mention of a --proxy-headers flag that should be set, but it appears I can't do that while also using Gunicorn?

Are there other steps I can take to address this issue?


Additional details: GET, DELETE and PUT work correctly. POST and PATCH fail (exactly) half the time.

Headers:

"x-request-id": "f70bbcc408f3a1347dbd6d377e97e13d",
"x-real-ip": "xxx.xxx.xxx",
"x-forwarded-for": "xxx.xxx.xxx",
"x-forwarded-host": "sample.domain",
"x-forwarded-port": "443",
"x-forwarded-proto": "https",
"x-original-uri": "/ping?test=true",
"x-scheme": "https",
"x-request-start": "t=1564505305.680",

Here is a link to the docker image we are using for the deployment as well.

Removed GUnicorn and tried running just uvicorn for testing and I'm still facing the same issue:

```INFO: PING DEBUG: ('147.191.218.178', 0) - ASGI [20] Sent {'type': 'http.request', 'body': '<17 bytes>', 'more_body': False} DEBUG: ('147.191.218.178', 0) - ASGI [20] Received {'type': 'http.response.start', 'status': 200, 'headers': '<...>'} INFO: ('147.191.218.178', 0) - "POST /ping?test=true HTTP/1.1" 200 DEBUG: ('147.191.218.178', 0) - ASGI [20] Received {'type': 'http.response.body', 'body': '<552 bytes>'} DEBUG: ('147.191.218.178', 0) - ASGI [20] Completed WARNING: Invalid HTTP request received. DEBUG: ('10.42.4.0', 53902) - Disconnected DEBUG: ('xxx.xxx.xx.xxx', 36620) - Connected DEBUG: ('xxx.xxx.xx.xxx', 36620) - ASGI [21] Started DEBUG: ('xxx.xxx.xx.xxx', 36620) - ASGI [21] Received {'type': 'http.response.start', 'status': 200, 'headers': '<...>'} INFO: ('xxx.xxx.xx.xxx', 36620) - "GET /about HTTP/1.1" 200 DEBUG: ('xxx.xxx.xx.xxx', 36620) - ASGI [21] Received {'type': 'http.response.body', 'body': '<80 bytes>'} DEBUG: ('xxx.xxx.xx.xxx', 36620) - ASGI [21] Completed DEBUG: ('xxx.xxx.xx.xxx', 36620) - Disconnected DEBUG: ('xxx.xxx.xx.xxx', 36626) - Connected DEBUG: ('xxx.xxx.xx.xxx', 36626) - ASGI [22] Started DEBUG: ('xxx.xxx.xx.xxx', 36626) - ASGI [22] Received {'type': 'http.response.start', 'status': 200, 'headers': '<...>'} INFO: ('xxx.xxx.xx.xxx', 36626) - "GET /about HTTP/1.1" 200 DEBUG: ('xxx.xxx.xx.xxx', 36626) - ASGI [22] Received {'type': 'http.response.body', 'body': '<80 bytes>'} DEBUG: ('xxx.xxx.xx.xxx', 36626) - ASGI [22] Completed DEBUG: ('xxx.xxx.xx.xxx', 36626) - Disconnected ```
gvbgduh commented 5 years ago

@matthax I can't really see an occurrence of 502s in the log, only the line - WARNING: Invalid HTTP request received., makes me a bit curios how an invalid request is handled tho. It was mentioned before, but have you tried different worker classes, different servers Daphne or Hypercorn? My last hunt for this issue revealed misconfigured timeout for long running requests in gunicorn and then an issue in the proxying service in front of everything. And then I wasn't able to replicate this issue.

matthax commented 5 years ago

@gvbgduh 502s occur on NGINX's side, accompanied by a message about the worker prematurely terminating a connection.

Requests are not long running and there's no load on the server. We've increased the timeout to well over 30 seconds just to confirm. Quick glance at metrics puts the average response time under 5ms between nginx and the worker.

Removing gunicorn resulted in the same 502s, same result after removing https from the equation.

hypercorn is working flawlessly.

gvbgduh commented 5 years ago

interesting @matthax, would it be possible to have a look at nginx logs and, ideally, at uvicorn ones for the same timeline?

matthax commented 5 years ago

I’ll get this for you tomorrow

Sent from my iPhone

On Aug 1, 2019, at 8:00 PM, George Bogodukhov notifications@github.com wrote:

interesting @matthax, would it be possible to have a look at nginx logs and, ideally, at uvicorn ones for the same timeline?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

matthax commented 5 years ago

@gvbgduh Here are the nginx logs associated with the error

``` xxx.xxx.xxx.xxx - [xxx.xxx.xxx.xxx] - - [02/Aug/2019:19:43:29 +0000] "POST /ping?test=true HTTP/1.1" 404 22 "-" "insomnia/6.5.4" 183 0.003 [notification-notification-80] 10.42.4.238:80 22 0.003 404 da977f29bc1798af801d3327b316cd0d 2019/08/02 19:43:30 [error] 4098#4098: *37449 upstream prematurely closed connection while reading response header from upstream, client: xxx.xxx.xxx.xxx, server: uvicorn.test, request: "POST /ping?test=true HTTP/1.1", upstream: "http://10.42.4.238:80/ping?test=true", host: "uvicorn.test" xxx.xxx.xxx.xxx - [xxx.xxx.xxx.xxx] - - [02/Aug/2019:19:43:30 +0000] "POST /ping?test=true HTTP/1.1" 502 174 "-" "insomnia/6.5.4" 183 0.002 [notification-notification-80] 10.42.4.238:80 0 0.002 502 7e232c6f6575170a314b4465d6a891bd xxx.xxx.xxx.xxx - [] - - [02/Aug/2019:19:43:32 +0000] "POST /ping?test=true HTTP/1.1" 404 22 "-" "insomnia/6.5.4" 183 0.002 [notification-notification-80] 10.42.4.238:80 22 0.003 404 4b1602b75aa902dde44b9e713c1a3389 2019/08/02 19:43:33 [error] 4098#4098: *37449 upstream prematurely closed connection while reading response header from upstream, client: xxx.xxx.xxx.xxx, server: uvicorn.test, request: "POST /ping?test=true HTTP/1.1", upstream: "http://10.42.4.238:80/ping?test=true", host: "uvicorn.test" xxx.xxx.xxx.xxx - [xxx.xxx.xxx.xxx] - - [02/Aug/2019:19:43:33 +0000] "POST /ping?test=true HTTP/1.1" 502 174 "-" "insomnia/6.5.4" 183 0.001 [notification-notification-80] 10.42.4.238:80 0 0.001 502 745466b5781fd1309c61bb913e1e965c xxx.xxx.xxx.xxx - [xxx.xxx.xxx.xxx] - - [02/Aug/2019:19:43:34 +0000] "POST /ping?test=true HTTP/1.1" 404 22 "-" "insomnia/6.5.4" 183 0.004 [notification-notification-80] 10.42.4.238:80 22 0.004 404 454920caa93418da601016be01a9a50c 2019/08/02 19:43:35 [error] 4098#4098: *37449 upstream prematurely closed connection while reading response header from upstream, client: xxx.xxx.xxx.xxx, server: uvicorn.test, request: "POST /ping?test=true HTTP/1.1", upstream: "http://10.42.4.238:80/ping?test=true", host: "uvicorn.test" xxx.xxx.xxx.xxx - [xxx.xxx.xxx.xxx] - - [02/Aug/2019:19:43:35 +0000] "POST /ping?test=true HTTP/1.1" 502 174 "-" "insomnia/6.5.4" 183 0.001 [notification-notification-80] 10.42.4.238:80 0 0.002 502 707758941a0d5fa7d9137a627d4b00db xxx.xxx.xxx.xxx - [xxx.xxx.xxx.xxx] - - [02/Aug/2019:19:43:36 +0000] "POST /ping?test=true HTTP/1.1" 404 22 "-" "insomnia/6.5.4" 183 0.003 [notification-notification-80] 10.42.4.238:80 22 0.003 404 6762d54b7c0a309a9875525a9cb24742 2019/08/02 19:43:39 [error] 4098#4098: *37449 upstream prematurely closed connection while reading response header from upstream, client: xxx.xxx.xxx.xxx, server: uvicorn.test, request: "POST /ping?test=true HTTP/1.1", upstream: "http://10.42.4.238:80/ping?test=true", host: "uvicorn.test" xxx.xxx.xxx.xxx - [xxx.xxx.xxx.xxx] - - [02/Aug/2019:19:43:39 +0000] "POST /ping?test=true HTTP/1.1" 502 174 "-" "insomnia/6.5.4" 183 0.002 [notification-notification-80] 10.42.4.238:80 0 0.002 502 2b6c9a210804c1fcd771b5a21e8eae8f ```

Note that in this instance I wasn't hitting an actual endpoint, just posting to a route we disabled and even then we get a consistent 502 every other request.

gbgduh commented 5 years ago

thanks a lot @matthax, i'll try to re-create this one, it gives a rough picture.

MilovdZee commented 5 years ago

I did some tcpdump'ing and found that uvicorn does not send the tcp-FIN packet to nginx and so nginx assumes the connection is still useable. nginx then sends the next http request to the open socket but uvicorn replies with a tcp-FIN packet and logs the invalid request error message. So uvicorn did not have an open connection??? Even though it did not send the FIN...

Workarounds:

But I prefer to have reuse of connection available and working. Still investigating how to get that working.

ademoverflow commented 4 years ago

Hi all ! Any update on this topic ?

tomchristie commented 4 years ago

Can anyone reduce this down into the simplest possible way to reproduce this. I'm happy to have a dig into this myself, but it'd be super helpful if there's a trivial set of instructions for me to work through and know that it's likely to reliably reproduce the issue if I follow those.

Is it sufficient simply to setup nginx and uvicorn and issue some POST requests or is there more to it?

matthax commented 4 years ago

I think you quite literally just need an NGINX instance set up with a reverse proxy to reproduce. Send POST/PUT/PATCH to get the error.

If you're not able to reproduce it like that let me know and I'll set you up a Dockerfile with a minimal example.

gvbgduh commented 4 years ago

I think it would be great @matthax. I was trying to reproduce it for quite a while from time to time without luck. (except the case from the above).

I used a simple example of the whole stack (please see below details).

I tested it with plain postman and with wrk to give it some load

$ wrk -t10 -c100 -d10s -s ./wrk.post.lua http://127.0.0.1/api
Running 10s test @ http://127.0.0.1/api
  10 threads and 100 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    75.79ms   21.80ms 198.54ms   74.54%
    Req/Sec   132.32     37.30   260.00     69.40%
  13299 requests in 10.11s, 2.35MB read
Requests/sec:   1316.05
Transfer/sec:    237.71KB

app/app.py

from starlette.applications import Starlette
from starlette.endpoints import HTTPEndpoint
from starlette.responses import JSONResponse

class Endpoint(HTTPEndpoint):
    async def get(self, request):
        return JSONResponse({"success": True, "method": "GET"})

    async def post(self, request):
        return JSONResponse({"success": True, "method": "POST"})

app = Starlette()
app.debug = True
app.mount("/api", app=Endpoint)

docker-compose.py

version: '3.7'

services:
  web:
    stdin_open: true
    tty: true
    build:
      context: ./
      dockerfile: ./Dockerfile
    command: gunicorn --log-level debug --bind :8080 --reuse-port --workers 1 --worker-class uvicorn.workers.UvicornWorker app.app:app --timeout 600 --graceful-timeout 500 --capture-output
    expose:
      - 8080
    environment:
      - DEBUG=1
      - PORT=8080

  nginx:
    build:
      context: ./nginx
      dockerfile: ./Dockerfile
    ports:
      - 80:80
    depends_on:
      - web

Dockerfile

FROM gcr.io/google-appengine/python
RUN virtualenv /env -p python3.7
ENV VIRTUAL_ENV /env
ENV PATH /env/bin:$PATH
ADD requirements.txt /requirements.txt
RUN pip install -r /requirements.txt
ADD . /app
EXPOSE 8080

nginx/Dockerfile

FROM nginx:1.15.12-alpine

RUN rm /etc/nginx/conf.d/default.conf
COPY nginx.conf /etc/nginx/conf.d

nginx/nginx.conf

upstream ds-mgmt-local {
    server web:8080;
}

server {

    listen 80;

    location / {
        proxy_pass http://local-test;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header Host $host;
        proxy_redirect off;

    }

    client_max_body_size 200M;

    proxy_connect_timeout       600;
    proxy_send_timeout          600;
    proxy_read_timeout          600;
    send_timeout                600;
}

wrk.post.lua

wrk.method = "POST"
wrk.body   = '{"firstKey": "somedata", "secondKey": "somedata"}'
wrk.headers["Content-Type"] = "application/json"
wrk.headers["Accept"] = "*/*"
wrk.headers["Accept-Encoding"] = "gzip, deflate"

requirements.txt

Click==7.0
gunicorn==20.0.0
h11==0.8.1
httptools==0.0.13
starlette==0.13.0
uvicorn==0.10.8
uvloop==0.14.0
websockets==8.1
gvbgduh commented 4 years ago

I have a sneaky suspicion it might be related to how errors are handled and propagated, so it crashes the uvicorn worker while nginx is still trying to communicate. But without a solid example, it's tough to be sure and say what can be done.

tomchristie commented 4 years ago

I am not able to reproduce this at the moment.

The application

example.py:

async def app(scope, receive, send):
    assert scope['type'] == 'http'
    await send({
        'type': 'http.response.start',
        'status': 200,
        'headers': [
            [b'content-type', b'text/plain'],
        ]
    })
    await send({
        'type': 'http.response.body',
        'body': b'Hello, world!',
    })

Running with uvicorn...

$ uvicorn example:app --uds /tmp/uvicorn.sock

The proxy

An nginx configuration, in ~/nginx.conf...

http {
  server {
    listen 4000;
    client_max_body_size 4G;

    server_name example.com;

    location / {
      proxy_set_header Host $http_host;
      proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
      proxy_set_header X-Forwarded-Proto $scheme;
      proxy_redirect off;
      proxy_buffering off;
      proxy_pass http://uvicorn;
    }

    location /static {
      # path for static files
      root /path/to/app/static;
    }
  }

  upstream uvicorn {
    server unix:/tmp/uvicorn.sock;
  }

}

events {
}

Running nginx...

$ nginx -c ~/nginx.conf

The client requests...

wrk.post.lua:

wrk.method = "POST"
wrk.body   = '{"firstKey": "somedata", "secondKey": "somedata"}'
wrk.headers["Content-Type"] = "application/json"
wrk.headers["Accept"] = "*/*"
wrk.headers["Accept-Encoding"] = "gzip, deflate"

Making the requests:

$ wrk -t10 -c100 -d10s -s ./wrk.post.lua http://127.0.0.1:4000/

Anyone able to provide a configuration setup that does replicate the issue?

I'd prefer example cases not to include docker, but just be strictly narrowed down to:

jordic commented 4 years ago

Not sure if makes sense... But following both implementations found that h11 is sending this event: https://github.com/encode/uvicorn/blob/master/uvicorn/protocols/http/h11_impl.py#L339

that I don't see on httptools one.. perhaps is here the bug?

ChristianHopper commented 4 years ago

I have the same issue - Nginx sometimes throw a 502 even under simple loads. I'm running uvicorn behind gunicorn and have fixed it by setting the flag --keep-alive=0 when running gunicorn. Using the UvicornH11Worker does not prevent the issue for me.

ChristianHopper commented 4 years ago

I fixed the issue by specifying "http": "h11" in the uvicorn config using the UvicornWorker class

gone commented 4 years ago

I hit this issue and managed to track it down to a cached hsts header. So make sure you're actually requesting on http and not https!

afilipovich commented 4 years ago

I hit this issue with 10 concurrent GET requests to the app that takes ~10ms to process single request. Switching to h11 http backend solved this.

oflannabhra commented 4 years ago

Just a heads up that my team ran into this exact issue when deploying to AWS, with the exact same configuration of some others in this thread:

Network Load Balancer -> Nginx -> Gunicorn/Uvicorn

probably 50% of POST and DELETE requests don't even make it to uvicorn, but get 504d by the nginx reverse proxy.

We are using nginx to facilitate HTTPS cert-based authentication.

euri10 commented 4 years ago

wild random guess but could it be that nginx uses http 1.0 (see https://stackoverflow.com/questions/41255979/why-between-nginx-nginx-upstream-use-http-1-0) which if I'm correct would set keep_alive to False in the httptools implementation: https://github.com/encode/uvicorn/blob/f6239169708adcf27433cb8d133e48f8835f0fa2/uvicorn/protocols/http/httptools_impl.py#L265

if that's the case, could you test with proxy_http_version set to 1.1 ?

http://nginx.org/en/docs/http/ngx_http_proxy_module.html#proxy_http_version

oflannabhra commented 4 years ago

Yes, I will definitely test that. I can confirm that our setup has HTTP1.1 in front of nginx and HTTP1.0 between nginx and uvicorn.

I thought that was weird when I saw it.

FaisalHBTU commented 4 years ago

I tried everything: timeout tuning of gunicorn, using --keep-alive=0 for uvicorn.workers.UvicornWorker, using UvicornH11Worker, but still 2-3 percent requests get 502 on nginx

euri10 commented 4 years ago

I tried everything: timeout tuning of gunicorn, using --keep-alive=0 for uvicorn.workers.UvicornWorker, using UvicornH11Worker, but still 2-3 percent requests get 502 on nginx

nobody reported back on explicitly setting proxy_http_version to 1.1

FaisalHBTU commented 4 years ago

I tried everything: timeout tuning of gunicorn, using --keep-alive=0 for uvicorn.workers.UvicornWorker, using UvicornH11Worker, but still 2-3 percent requests get 502 on nginx

nobody reported back on explicitly setting proxy_http_version to 1.1

image nginx already using 1.1

oflannabhra commented 4 years ago

I tried everything: timeout tuning of gunicorn, using --keep-alive=0 for uvicorn.workers.UvicornWorker, using UvicornH11Worker, but still 2-3 percent requests get 502 on nginx

nobody reported back on explicitly setting proxy_http_version to 1.1

I updated our setup to use HTTP1.1, and it had no effect.

I figured out the root cause was actually true timeouts in our container, following requests weren't making it to uvicorn logs because the container had actually halted execution. So my situation was not actually related to this issue.

euri10 commented 4 years ago

I tried everything: timeout tuning of gunicorn, using --keep-alive=0 for uvicorn.workers.UvicornWorker, using UvicornH11Worker, but still 2-3 percent requests get 502 on nginx

nobody reported back on explicitly setting proxy_http_version to 1.1

image nginx already using 1.1

I'm no nginx pro but l thought there were logs that give an explanation for 502, @FaisalHBTU ?

Glad you solved your issue @oflannabhra and thanks for reporting back

FaisalHBTU commented 4 years ago

I tried everything: timeout tuning of gunicorn, using --keep-alive=0 for uvicorn.workers.UvicornWorker, using UvicornH11Worker, but still 2-3 percent requests get 502 on nginx

nobody reported back on explicitly setting proxy_http_version to 1.1

image nginx already using 1.1

I'm no nginx pro but l thought there were logs that give an explanation for 502, @FaisalHBTU ?

Glad you solved your issue @oflannabhra and thanks for reporting back @euri10 error for reference: image

malthejorgensen commented 4 years ago

We have been having some of the errors described in this thread – intermittent WARNING: Invalid HTTP request received. from uvicorn immediately followed by a H13 Connection closed without response from Heroku.

Disabling Keep-Alive or switching to h11 both seems to mitigate the issue.

The solution we came up with is in PR https://github.com/encode/uvicorn/pull/778.

You can try this out with pip install -e 'git+git@github.com:eduflow/uvicorn.git@on_url-called-multiple-times#egg=uvicorn'

jordic commented 4 years ago

@malthejorgensen I'm pretty sure, that if you provide a test case, reproducing the error, @tomchristie will merge this PR.

malthejorgensen commented 4 years ago

Believe me, I've tried 😄 It's one of those pesky only-in-production bugs.

It also seems like it's very hard or impossible to get TCP fragmentation on localhost-only traffic – see the scapy docs and this Stackoverflow answer. To be honest, I really don't know anything about TCP fragmentation. It's just my guess from what I see in Wireshark, plus not being able to reproduce when doing it localhost-only, and that it happens for fairly large requests in our case at least.

For what it's worth, the patch (PR #778) solved our issue – but it could be that the underlying issue is something else, and I don't wanna send anybody on a wild goose chase.

I have a Wireshark packet capture from one of the times I was able to reproduce the issue. Note that I'm relaying the traffic through PageKite in order work around the localhost "limitation" that is infinite packet size/no fragmentation. In it, you can see the connection being reset by Uvicorn at the end – corresponding to the WARNING: Invalid HTTP request received: minimal-example.pcapng.zip

malthejorgensen commented 3 years ago

I now have a small repository that is able to reproduce the issue: https://github.com/malthejorgensen/uvicorn-issue-344

It deploys to Heroku, and will show the issue every 3-5 requests (let's say about every fourth submission of the form in the small app). The code is currently deployed at: https://uvicorn-issue-344.herokuapp.com/

prockwin commented 3 years ago

Hi @malthejorgensen, i'm applied your patch (PR #778 ) on my project, but it didn't help me My app is behind nginx and accept ~130 POST requests per second 24/7, and 502 error occurs about once every 4 days

malthejorgensen commented 3 years ago

@Prockwin In that case your issue is probably a different one than what I’m seeing.

Is your uvicorn service a public API? Requests with an incorrect Content-Length-header also cause this behavior but that’s not so much a uvicorn issue as perhaps a slightly broken third-party client connecting to your API. I believe there are some other cases where slightly malformed requests cause this behavior.

The patch I have doesn’t impact those other causes of 502’s.

florimondmanca commented 3 years ago

Hey everyone,

I think what might help move this forward would be having Uvicorn log some more information other than a rather unhelpful Invalid HTTP request received message.

I filed #886 that would turn those messages into much more helpful full traceback dumps:

WARNING:  Invalid HTTP request received.
Traceback (most recent call last):
  File "/Users/florimond/Developer/python-projects/uvicorn/uvicorn/protocols/http/httptools_impl.py", line 165, in data_received
    self.parser.feed_data(data)
  File "httptools/parser/parser.pyx", line 193, in httptools.parser.parser.HttpParser.feed_data
httptools.parser.errors.HttpParserInvalidMethodError: invalid HTTP method

Upon approval we can merge and release in a patch release, then you'd be able to try that version out in production and see what the actual error / cause is. Maybe that'll be that "fragmented URL TCP packets" error from #778, maybe that'll be something else, or a combination… We'll see. :-)

florimondmanca commented 3 years ago

Hey all! Uvicorn 0.13.2 has been released. Feel free to try it out and report any enhanced tracebacks you can see from these errors. :-) Thank you!

euri10 commented 3 years ago

Hey all! Uvicorn 0.13.2 has been released. Feel free to try it out and report any enhanced tracebacks you can see from these errors. :-) Thank you!

I was looking at old PR tonight @florimondmanca... it seems like https://github.com/encode/uvicorn/pull/205 is relevant here, we dont answer a 400 in case of an invalid request like gunicorn or hypercorn and apparently at the reverse-proxy level it turns into a 502, might be worth resurrecting that PR ?

euri10 commented 3 years ago

If people impacted could look at their logs (if running 0.13.2 :green_apple: ) and tell us if they see something about a potentially malformed request that would help


note to self, reproduction steps, hopefully what we see here

  1. run nginx with: docker run --rm --name nginx_uvicorn -p 80:80 -v /tmp/uvicorn.sock:/tmp/uvicorn.sock -v $PWD/nginx.conf:/etc/nginx/nginx.conf:ro nginx with quite a minimal nginx.conf
    
    events {
    }

http { server { listen 80; location / { proxy_pass http://uvicorn; } } upstream uvicorn { server unix:/tmp/uvicorn.sock fail_timeout=0; } }


2. run uvicorn : `uvicorn app:app --uds /tmp/uvicorn.sock`

3. send incorrect request with `curl -i 'http://localhost/?x=y z'`
-----

client result:
502 Bad Gateway

502 Bad Gateway


nginx/1.19.1

nginx result:

2020/12/28 19:27:46 [error] 29#29: *4 upstream prematurely closed connection while reading response header from upstream, client: 172.17.0.1, server: , request: "GET /?x=y z HTTP/1.1", upstream: "http://unix:/tmp/uvicorn.sock:/?x=y z", host: "localhost" 172.17.0.1 - - [28/Dec/2020:19:27:46 +0000] "GET /?x=y z HTTP/1.1" 502 157 "-" "curl/7.64.0"


uvicorn result:

WARNING: Invalid HTTP request received. Traceback (most recent call last): File "/home/lotso/PycharmProjects/uvicorn/uvicorn/protocols/http/httptools_impl.py", line 165, in data_received self.parser.feed_data(data) File "httptools/parser/parser.pyx", line 193, in httptools.parser.parser.HttpParser.feed_data httptools.parser.errors.HttpParserError: invalid constant string