Closed nhumrich closed 2 years ago
Okay - so I'd start by trying each of the h11
and httptools
implementations, and see if replicates in both cases. You could also dig into if you can replicate with hypercorn or daphne.
That'll give us a first pass onto what's going on.
Uvicorn will respond like that if it gets a malformed request. So another route into figuring out the issue would be to change the source code so that it logs out the raw byte streams that are being sent, and examine how they look in the broken cases.
Great idea. How do I swap between h11 and httptools?
So. --http h11
or --http httptools
.
Def. also recommend trying to replicate on Daphne or Hypercorn following that too.
Let's narrow it down as much as poss as a first step.
The issue only happens with httptools. I could not reproduce with h11 on uvicorn. I could not reproduce using hypercorn. I wasn't able to get daphne working.
Hope that helps.
I can confirm this issue, can see such behavior from uvicorn==0.4.3 (didn't test earlier). Seems related to websockets, going to investigate deeper next week. @nhumrich are you using ws?
@leobuskin no, I am not using websockets at all
Yeah, I see occasionally similar behaviour as well, quite rare though. It seems to be related to concurrent long running requests with some heavy CPU and memory usage in my case. Eg. csv files uploads (2-9 Mb) with background validation and writing to a db (yeah, via POST). So, if I make 4 requests at the same time the 3rd and/or 4th are likely to throw 502 back. Probably it might affect other requests, not sure.
I’ll try to reliably recreate this one and hunt it down, happy there’s an issue already. Run it behind nginx locally was a missing peace of puzzle to catch it.
I was able to replicate it (sort of reliable), but under some load.
I've tried --worker-class uvicorn.workers.UvicornH11Worker
and --worker-class uvicorn.workers.UvicornWorker
and it appears in both cases.
On the surface, it seems that it happens for the consecutive request when the former one makes the worker timed out. Not sure if it's a big surprise, but I'll try to dig deeper.
nginx_1 | 172.21.0.1 - - [14/May/2019:15:44:31 +0000] "POST /endpoint/bb960881-5f1f-4e5c-a38d-6d893445bea8 HTTP/1.1" 200 805 "-" "python-requests/2.21.0" "-"
nginx_1 | 2019/05/14 15:44:31 [warn] 8#8: *19 a client request body is buffered to a temporary file /var/cache/nginx/client_temp/0000000010, client: 172.21.0.1, server: , request: "POST /endpoint/bb960881-5f1f-4e5c-a38d-6d893445bea8 HTTP/1.1", host: "localhost:1337"
web_1 | [2019-05-14 15:44:31 +0000] [37] [DEBUG] ('172.21.0.4', 43134) - Connected
web_1 | [2019-05-14 15:44:46 +0000] [1] [CRITICAL] WORKER TIMEOUT (pid:37)
nginx_1 | 172.21.0.1 - - [14/May/2019:15:44:46 +0000] "POST /endpoint/bb960881-5f1f-4e5c-a38d-6d893445bea8 HTTP/1.1" 502 158 "-" "python-requests/2.21.0" "-"
nginx_1 | 2019/05/14 15:44:46 [error] 8#8: *19 upstream prematurely closed connection while reading response header from upstream, client: 172.21.0.1, server: , request: "POST /endpoint/bb960881-5f1f-4e5c-a38d-6d893445bea8", upstream: "/endpoint/bb960881-5f1f-4e5c-a38d-6d893445bea8", host: "localhost:1337"
nginx_1 | 2019/05/14 15:44:46 [warn] 8#8: *21 a client request body is buffered to a temporary file /var/cache/nginx/client_temp/0000000011, client: 172.21.0.1, server: , request: "POST /endpoint/bb960881-5f1f-4e5c-a38d-6d893445bea8 HTTP/1.1", host: "localhost:1337"
web_1 | [2019-05-14 15:44:46 +0000] [53] [INFO] Booting worker with pid: 53
web_1 | [2019-05-14 15:44:47 +0000] [53] [INFO] Started server process [53]
web_1 | [2019-05-14 15:44:47 +0000] [53] [INFO] Waiting for application startup.
web_1 | [2019-05-14 15:44:47 +0000] [53] [DEBUG] ('172.21.0.4', 43138) - Connected
web_1 | [2019-05-14 15:44:47 +0000] [53] [INFO] ('172.21.0.4', 43138) - "POST /endpoint/bb960881-5f1f-4e5c-a38d-6d893445bea8 HTTP/1.0" 200
All in all, I think there's no issue in uvicorn
for me. Adjusting the timeout in uvicorn
/gunicorn
fixes it. If this timeout is greater than the nginx
one it can hit 504
from it.
for me, the behavior isn't timeout related. I didn't have any timing out requests, and it didn't require much load to reproduce
@nhumrich interesting, how would you reproduce it? Are you running it via gunicorn or just plain uvicorn?
@gvbgduh I tried it both ways. Both reproduce the error when I use httptools, but are fine when I use h11. Just a web server behind nginx. Basically, I have ALB -> nginx -> myservice. I haven't really spent the time to reproduce with a minimal example, but I imagine reproducing has moreso to do with nginx configuration then the actual code.
Encountering the same issue with a similar configuration, however, I also noticed in the deployment instructions there is mention of a --proxy-headers
flag that should be set, but it appears I can't do that while also using Gunicorn?
Are there other steps I can take to address this issue?
Additional details:
GET
, DELETE
and PUT
work correctly. POST
and PATCH
fail (exactly) half the time.
Headers:
"x-request-id": "f70bbcc408f3a1347dbd6d377e97e13d",
"x-real-ip": "xxx.xxx.xxx",
"x-forwarded-for": "xxx.xxx.xxx",
"x-forwarded-host": "sample.domain",
"x-forwarded-port": "443",
"x-forwarded-proto": "https",
"x-original-uri": "/ping?test=true",
"x-scheme": "https",
"x-request-start": "t=1564505305.680",
Here is a link to the docker image we are using for the deployment as well.
Removed GUnicorn and tried running just uvicorn for testing and I'm still facing the same issue:
@matthax I can't really see an occurrence of 502
s in the log, only the line - WARNING: Invalid HTTP request received.
, makes me a bit curios how an invalid request is handled tho.
It was mentioned before, but have you tried different worker classes, different servers Daphne
or Hypercorn
?
My last hunt for this issue revealed misconfigured timeout for long running requests in gunicorn
and then an issue in the proxying service in front of everything.
And then I wasn't able to replicate this issue.
@gvbgduh 502s
occur on NGINX's side, accompanied by a message about the worker prematurely terminating a connection.
Requests are not long running and there's no load on the server. We've increased the timeout to well over 30 seconds just to confirm. Quick glance at metrics puts the average response time under 5ms between nginx and the worker.
Removing gunicorn
resulted in the same 502s, same result after removing https
from the equation.
hypercorn
is working flawlessly.
interesting @matthax, would it be possible to have a look at nginx logs and, ideally, at uvicorn ones for the same timeline?
I’ll get this for you tomorrow
Sent from my iPhone
On Aug 1, 2019, at 8:00 PM, George Bogodukhov notifications@github.com wrote:
interesting @matthax, would it be possible to have a look at nginx logs and, ideally, at uvicorn ones for the same timeline?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.
@gvbgduh Here are the nginx logs associated with the error
Note that in this instance I wasn't hitting an actual endpoint, just posting to a route we disabled and even then we get a consistent 502 every other request.
thanks a lot @matthax, i'll try to re-create this one, it gives a rough picture.
I did some tcpdump'ing and found that uvicorn does not send the tcp-FIN packet to nginx and so nginx assumes the connection is still useable. nginx then sends the next http request to the open socket but uvicorn replies with a tcp-FIN packet and logs the invalid request error message. So uvicorn did not have an open connection??? Even though it did not send the FIN...
Workarounds:
gunicorn -k uvicorn.workers.UvicornWorker -c /gunicorn_conf.py main:app --keep-alive=0
gunicorn -k uvicorn.workers.UvicornH11Worker -c /gunicorn_conf.py main:app
But I prefer to have reuse of connection available and working. Still investigating how to get that working.
Hi all ! Any update on this topic ?
Can anyone reduce this down into the simplest possible way to reproduce this. I'm happy to have a dig into this myself, but it'd be super helpful if there's a trivial set of instructions for me to work through and know that it's likely to reliably reproduce the issue if I follow those.
Is it sufficient simply to setup nginx and uvicorn and issue some POST requests or is there more to it?
I think you quite literally just need an NGINX instance set up with a reverse proxy to reproduce. Send POST/PUT/PATCH to get the error.
If you're not able to reproduce it like that let me know and I'll set you up a Dockerfile with a minimal example.
I think it would be great @matthax. I was trying to reproduce it for quite a while from time to time without luck. (except the case from the above).
I used a simple example of the whole stack (please see below details).
I tested it with plain postman and with wrk
to give it some load
$ wrk -t10 -c100 -d10s -s ./wrk.post.lua http://127.0.0.1/api
Running 10s test @ http://127.0.0.1/api
10 threads and 100 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 75.79ms 21.80ms 198.54ms 74.54%
Req/Sec 132.32 37.30 260.00 69.40%
13299 requests in 10.11s, 2.35MB read
Requests/sec: 1316.05
Transfer/sec: 237.71KB
app/app.py
from starlette.applications import Starlette
from starlette.endpoints import HTTPEndpoint
from starlette.responses import JSONResponse
class Endpoint(HTTPEndpoint):
async def get(self, request):
return JSONResponse({"success": True, "method": "GET"})
async def post(self, request):
return JSONResponse({"success": True, "method": "POST"})
app = Starlette()
app.debug = True
app.mount("/api", app=Endpoint)
docker-compose.py
version: '3.7'
services:
web:
stdin_open: true
tty: true
build:
context: ./
dockerfile: ./Dockerfile
command: gunicorn --log-level debug --bind :8080 --reuse-port --workers 1 --worker-class uvicorn.workers.UvicornWorker app.app:app --timeout 600 --graceful-timeout 500 --capture-output
expose:
- 8080
environment:
- DEBUG=1
- PORT=8080
nginx:
build:
context: ./nginx
dockerfile: ./Dockerfile
ports:
- 80:80
depends_on:
- web
Dockerfile
FROM gcr.io/google-appengine/python
RUN virtualenv /env -p python3.7
ENV VIRTUAL_ENV /env
ENV PATH /env/bin:$PATH
ADD requirements.txt /requirements.txt
RUN pip install -r /requirements.txt
ADD . /app
EXPOSE 8080
nginx/Dockerfile
FROM nginx:1.15.12-alpine
RUN rm /etc/nginx/conf.d/default.conf
COPY nginx.conf /etc/nginx/conf.d
nginx/nginx.conf
upstream ds-mgmt-local {
server web:8080;
}
server {
listen 80;
location / {
proxy_pass http://local-test;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header Host $host;
proxy_redirect off;
}
client_max_body_size 200M;
proxy_connect_timeout 600;
proxy_send_timeout 600;
proxy_read_timeout 600;
send_timeout 600;
}
wrk.post.lua
wrk.method = "POST"
wrk.body = '{"firstKey": "somedata", "secondKey": "somedata"}'
wrk.headers["Content-Type"] = "application/json"
wrk.headers["Accept"] = "*/*"
wrk.headers["Accept-Encoding"] = "gzip, deflate"
requirements.txt
Click==7.0
gunicorn==20.0.0
h11==0.8.1
httptools==0.0.13
starlette==0.13.0
uvicorn==0.10.8
uvloop==0.14.0
websockets==8.1
I have a sneaky suspicion it might be related to how errors are handled and propagated, so it crashes the uvicorn worker while nginx is still trying to communicate. But without a solid example, it's tough to be sure and say what can be done.
I am not able to reproduce this at the moment.
example.py:
async def app(scope, receive, send):
assert scope['type'] == 'http'
await send({
'type': 'http.response.start',
'status': 200,
'headers': [
[b'content-type', b'text/plain'],
]
})
await send({
'type': 'http.response.body',
'body': b'Hello, world!',
})
Running with uvicorn...
$ uvicorn example:app --uds /tmp/uvicorn.sock
An nginx configuration, in ~/nginx.conf
...
http {
server {
listen 4000;
client_max_body_size 4G;
server_name example.com;
location / {
proxy_set_header Host $http_host;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
proxy_redirect off;
proxy_buffering off;
proxy_pass http://uvicorn;
}
location /static {
# path for static files
root /path/to/app/static;
}
}
upstream uvicorn {
server unix:/tmp/uvicorn.sock;
}
}
events {
}
Running nginx...
$ nginx -c ~/nginx.conf
wrk.post.lua:
wrk.method = "POST"
wrk.body = '{"firstKey": "somedata", "secondKey": "somedata"}'
wrk.headers["Content-Type"] = "application/json"
wrk.headers["Accept"] = "*/*"
wrk.headers["Accept-Encoding"] = "gzip, deflate"
Making the requests:
$ wrk -t10 -c100 -d10s -s ./wrk.post.lua http://127.0.0.1:4000/
Anyone able to provide a configuration setup that does replicate the issue?
I'd prefer example cases not to include docker, but just be strictly narrowed down to:
Not sure if makes sense... But following both implementations found that h11 is sending this event: https://github.com/encode/uvicorn/blob/master/uvicorn/protocols/http/h11_impl.py#L339
that I don't see on httptools one.. perhaps is here the bug?
I have the same issue - Nginx sometimes throw a 502 even under simple loads. I'm running uvicorn behind gunicorn and have fixed it by setting the flag --keep-alive=0 when running gunicorn. Using the UvicornH11Worker does not prevent the issue for me.
I fixed the issue by specifying "http": "h11" in the uvicorn config using the UvicornWorker class
I hit this issue and managed to track it down to a cached hsts header. So make sure you're actually requesting on http and not https!
I hit this issue with 10 concurrent GET requests to the app that takes ~10ms to process single request.
Switching to h11
http backend solved this.
Just a heads up that my team ran into this exact issue when deploying to AWS, with the exact same configuration of some others in this thread:
Network Load Balancer
-> Nginx
-> Gunicorn/Uvicorn
probably 50% of POST
and DELETE
requests don't even make it to uvicorn, but get 504
d by the nginx reverse proxy.
We are using nginx to facilitate HTTPS cert-based authentication.
wild random guess but could it be that nginx uses http 1.0 (see https://stackoverflow.com/questions/41255979/why-between-nginx-nginx-upstream-use-http-1-0) which if I'm correct would set keep_alive to False in the httptools implementation: https://github.com/encode/uvicorn/blob/f6239169708adcf27433cb8d133e48f8835f0fa2/uvicorn/protocols/http/httptools_impl.py#L265
if that's the case, could you test with proxy_http_version set to 1.1 ?
http://nginx.org/en/docs/http/ngx_http_proxy_module.html#proxy_http_version
Yes, I will definitely test that. I can confirm that our setup has HTTP1.1 in front of nginx and HTTP1.0 between nginx and uvicorn.
I thought that was weird when I saw it.
I tried everything: timeout tuning of gunicorn, using --keep-alive=0
for uvicorn.workers.UvicornWorker
, using UvicornH11Worker
, but still 2-3 percent requests get 502
on nginx
I tried everything: timeout tuning of gunicorn, using
--keep-alive=0
foruvicorn.workers.UvicornWorker
, usingUvicornH11Worker
, but still 2-3 percent requests get502
onnginx
nobody reported back on explicitly setting proxy_http_version
to 1.1
I tried everything: timeout tuning of gunicorn, using
--keep-alive=0
foruvicorn.workers.UvicornWorker
, usingUvicornH11Worker
, but still 2-3 percent requests get502
onnginx
nobody reported back on explicitly setting
proxy_http_version
to 1.1
nginx already using 1.1
I tried everything: timeout tuning of gunicorn, using
--keep-alive=0
foruvicorn.workers.UvicornWorker
, usingUvicornH11Worker
, but still 2-3 percent requests get502
onnginx
nobody reported back on explicitly setting
proxy_http_version
to 1.1
I updated our setup to use HTTP1.1, and it had no effect.
I figured out the root cause was actually true timeouts in our container, following requests weren't making it to uvicorn logs because the container had actually halted execution. So my situation was not actually related to this issue.
I tried everything: timeout tuning of gunicorn, using
--keep-alive=0
foruvicorn.workers.UvicornWorker
, usingUvicornH11Worker
, but still 2-3 percent requests get502
onnginx
nobody reported back on explicitly setting
proxy_http_version
to 1.1nginx already using
1.1
I'm no nginx pro but l thought there were logs that give an explanation for 502, @FaisalHBTU ?
Glad you solved your issue @oflannabhra and thanks for reporting back
I tried everything: timeout tuning of gunicorn, using
--keep-alive=0
foruvicorn.workers.UvicornWorker
, usingUvicornH11Worker
, but still 2-3 percent requests get502
onnginx
nobody reported back on explicitly setting
proxy_http_version
to 1.1nginx already using
1.1
I'm no nginx pro but l thought there were logs that give an explanation for 502, @FaisalHBTU ?
Glad you solved your issue @oflannabhra and thanks for reporting back @euri10 error for reference:
We have been having some of the errors described in this thread – intermittent WARNING: Invalid HTTP request received.
from uvicorn immediately followed by a H13 Connection closed without response
from Heroku.
Disabling Keep-Alive or switching to h11 both seems to mitigate the issue.
The solution we came up with is in PR https://github.com/encode/uvicorn/pull/778.
You can try this out with pip install -e 'git+git@github.com:eduflow/uvicorn.git@on_url-called-multiple-times#egg=uvicorn'
@malthejorgensen I'm pretty sure, that if you provide a test case, reproducing the error, @tomchristie will merge this PR.
Believe me, I've tried 😄 It's one of those pesky only-in-production bugs.
It also seems like it's very hard or impossible to get TCP fragmentation on localhost-only traffic – see the scapy docs and this Stackoverflow answer. To be honest, I really don't know anything about TCP fragmentation. It's just my guess from what I see in Wireshark, plus not being able to reproduce when doing it localhost-only, and that it happens for fairly large requests in our case at least.
For what it's worth, the patch (PR #778) solved our issue – but it could be that the underlying issue is something else, and I don't wanna send anybody on a wild goose chase.
I have a Wireshark packet capture from one of the times I was able to reproduce the issue. Note that I'm relaying the traffic through PageKite in order work around the localhost "limitation" that is infinite packet size/no fragmentation. In it, you can see the connection being reset by Uvicorn at the end – corresponding to the WARNING: Invalid HTTP request received
:
minimal-example.pcapng.zip
I now have a small repository that is able to reproduce the issue: https://github.com/malthejorgensen/uvicorn-issue-344
It deploys to Heroku, and will show the issue every 3-5 requests (let's say about every fourth submission of the form in the small app). The code is currently deployed at: https://uvicorn-issue-344.herokuapp.com/
Hi @malthejorgensen, i'm applied your patch (PR #778 ) on my project, but it didn't help me My app is behind nginx and accept ~130 POST requests per second 24/7, and 502 error occurs about once every 4 days
@Prockwin In that case your issue is probably a different one than what I’m seeing.
Is your uvicorn service a public API? Requests with an incorrect Content-Length
-header also cause this behavior but that’s not so much a uvicorn issue as perhaps a slightly broken third-party client connecting to your API. I believe there are some other cases where slightly malformed requests cause this behavior.
The patch I have doesn’t impact those other causes of 502’s.
Hey everyone,
I think what might help move this forward would be having Uvicorn log some more information other than a rather unhelpful Invalid HTTP request received
message.
I filed #886 that would turn those messages into much more helpful full traceback dumps:
WARNING: Invalid HTTP request received.
Traceback (most recent call last):
File "/Users/florimond/Developer/python-projects/uvicorn/uvicorn/protocols/http/httptools_impl.py", line 165, in data_received
self.parser.feed_data(data)
File "httptools/parser/parser.pyx", line 193, in httptools.parser.parser.HttpParser.feed_data
httptools.parser.errors.HttpParserInvalidMethodError: invalid HTTP method
Upon approval we can merge and release in a patch release, then you'd be able to try that version out in production and see what the actual error / cause is. Maybe that'll be that "fragmented URL TCP packets" error from #778, maybe that'll be something else, or a combination… We'll see. :-)
Hey all! Uvicorn 0.13.2 has been released. Feel free to try it out and report any enhanced tracebacks you can see from these errors. :-) Thank you!
Hey all! Uvicorn 0.13.2 has been released. Feel free to try it out and report any enhanced tracebacks you can see from these errors. :-) Thank you!
I was looking at old PR tonight @florimondmanca... it seems like https://github.com/encode/uvicorn/pull/205 is relevant here, we dont answer a 400 in case of an invalid request like gunicorn or hypercorn and apparently at the reverse-proxy level it turns into a 502, might be worth resurrecting that PR ?
If people impacted could look at their logs (if running 0.13.2 :green_apple: ) and tell us if they see something about a potentially malformed request that would help
note to self, reproduction steps, hopefully what we see here
docker run --rm --name nginx_uvicorn -p 80:80 -v /tmp/uvicorn.sock:/tmp/uvicorn.sock -v $PWD/nginx.conf:/etc/nginx/nginx.conf:ro nginx
with quite a minimal nginx.conf
events {
}
http { server { listen 80; location / { proxy_pass http://uvicorn; } } upstream uvicorn { server unix:/tmp/uvicorn.sock fail_timeout=0; } }
2. run uvicorn : `uvicorn app:app --uds /tmp/uvicorn.sock`
3. send incorrect request with `curl -i 'http://localhost/?x=y z'`
-----
client result:
nginx result:
2020/12/28 19:27:46 [error] 29#29: *4 upstream prematurely closed connection while reading response header from upstream, client: 172.17.0.1, server: , request: "GET /?x=y z HTTP/1.1", upstream: "http://unix:/tmp/uvicorn.sock:/?x=y z", host: "localhost" 172.17.0.1 - - [28/Dec/2020:19:27:46 +0000] "GET /?x=y z HTTP/1.1" 502 157 "-" "curl/7.64.0"
uvicorn result:
WARNING: Invalid HTTP request received. Traceback (most recent call last): File "/home/lotso/PycharmProjects/uvicorn/uvicorn/protocols/http/httptools_impl.py", line 165, in data_received self.parser.feed_data(data) File "httptools/parser/parser.pyx", line 193, in httptools.parser.parser.HttpParser.feed_data httptools.parser.errors.HttpParserError: invalid constant string
There seems to be some weird issue with pipelining and or connection reuse and nginx. I have nginx sitting in front of my uvicorn/starlette app, and on patch/post requests I occasionally get 502's from nginx (about 30-40% of the time). The nginx logs say:
all over the place
and my app logs say:
with not much more information.
I forced connection reuse to not work by adding the default header
Connection: close
, which forces nginx to kill the connection. Performance drops significantly, but at least I don't get 502's.potentially relevant: the 502 doesn't seem to ever happen on a GET