cloudflare / quiche

🥧 Savoury implementation of the QUIC transport protocol and HTTP/3
https://docs.quic.tech/quiche/
BSD 2-Clause "Simplified" License
9.3k stars 700 forks source link

HTTP/3 Nginx hanging requests #813

Closed qjavax closed 3 years ago

qjavax commented 3 years ago

Hi, I'm having some issues with HTTP/3 nginx + quiche. After n-times requests to server it will hang and then timeout. Problem is replicatible with curl: for x in {1..1000}; do echo " ===> $x"; curl3 -vvI --http3 https://<domain>/ -X GET; done or with browser (in my case Firefox) with http3 enabled and refreshing page many times. Number of successful request may vary in range from ~5 to ~350. I'm sure thats not case of curl (curling to https://cloudflare-quic.com/ will not cause any problems) or problem of nginx or network configuration (http/2 and http/1 will not cause problems as well). I also tried changing nginx configuration of keepalive or http3_ configuration directives. Nginx works as a reverse proxy to Apache (but I belive Apache is not causing any issues here)

All build on arch linux with pkgbuild from quiche commit: 9d0c677ef1411b24d720b5c8b73bcc94b5535c29 (latest right now) and nginx version: nginx/1.16.1.

Log from successful curl request:


*   Trying 89.161.255.207:443...
* Sent QUIC client Initial, ALPN: h3-29,h3-28,h3-27
* Connected to <domain> (<ip>) port 443 (#0)
* h3 [:method: GET]
* h3 [:path: /]
* h3 [:scheme: https]
* h3 [:authority: <domain>]
* h3 [user-agent: curl/7.74.0-DEV]
* h3 [accept: */*]
* Using HTTP/3 Stream ID: 0 (easy handle 0x557a6fbc1b10)
> GET / HTTP/3
> Host: <domain>
> user-agent: curl/7.74.0-DEV
> accept: */*
> 
< HTTP/3 200
HTTP/3 200
< date: Tue, 19 Jan 2021 10:21:30 GMT
date: Tue, 19 Jan 2021 10:21:30 GMT
< content-type: text/html; charset=UTF-8
content-type: text/html; charset=UTF-8
< vary: Accept-Encoding
vary: Accept-Encoding
< server: Apache
server: Apache
< alt-svc: h3-29=":443"; ma=86400
alt-svc: h3-29=":443"; ma=86400
* Connection #0 to host <domain> left intact

With curl in for loop request will hang to timeout then proceed next request and aften n-times hang again.

 ===> 441
* Trying <ip>:443...
* Sent QUIC client Initial, ALPN: h3-29,h3-28,h3-27

* Connection timed out after 300000 milliseconds
* Closing connection 0
curl: (28) Connection timed out after 300000 milliseconds
===> 442
* Trying <ip>:443...

Nginx debug log:

2021/01/19 12:06:17 [debug] 19620#0: *98 quic done reading
2021/01/19 12:06:17 [debug] 19620#0: *98 update posted event 000055C5E49479E0
2021/01/19 12:06:17 [debug] 19620#0: *98 delete posted event 000055C5E49479E0
2021/01/19 12:06:17 [debug] 19620#0: *98 quic write handler
2021/01/19 12:06:17 [debug] 19620#0: *98 quic alarm fired
2021/01/19 12:06:17 [debug] 19620#0: *98 quic done writing
2021/01/19 12:06:17 [debug] 19620#0: *98 event timer del: 14: 3060360
2021/01/19 12:06:17 [debug] 19620#0: *98 event timer add: 14: 50881:3060359
2021/01/19 12:06:21 [debug] 19620#0: *98 recvmsg: fd:14 n:1200
2021/01/19 12:06:21 [debug] 19620#0: *98 quic read handler
2021/01/19 12:06:21 [debug] 19620#0: *98 http3 idle handler
2021/01/19 12:06:21 [debug] 19620#0: *98 quic done reading
2021/01/19 12:06:21 [debug] 19620#0: *98 post event 000055C5E49479E0
2021/01/19 12:06:21 [debug] 19620#0: *98 recvmsg: fd:14 n:1200
2021/01/19 12:06:21 [debug] 19620#0: *98 quic read handler
2021/01/19 12:06:21 [debug] 19620#0: *98 http3 idle handler
2021/01/19 12:06:21 [debug] 19620#0: *98 quic done reading
2021/01/19 12:06:21 [debug] 19620#0: *98 update posted event 000055C5E49479E0
2021/01/19 12:06:21 [debug] 19620#0: *98 delete posted event 000055C5E49479E0
2021/01/19 12:06:21 [debug] 19620#0: *98 quic write handler
2021/01/19 12:06:21 [debug] 19620#0: *98 quic alarm fired
2021/01/19 12:06:21 [debug] 19620#0: *98 quic done writing
2021/01/19 12:06:21 [debug] 19620#0: *98 event timer del: 14: 3060359
2021/01/19 12:06:21 [debug] 19620#0: *98 event timer add: 14: 46885:3060359

--- from now it hangs ---

2021/01/19 12:06:29 [debug] 19620#0: *98 recvmsg: fd:14 n:1200
2021/01/19 12:06:29 [debug] 19620#0: *98 quic read handler
2021/01/19 12:06:29 [debug] 19620#0: *98 http3 idle handler
2021/01/19 12:06:29 [debug] 19620#0: *98 quic done reading
2021/01/19 12:06:29 [debug] 19620#0: *98 post event 000055C5E49479E0
2021/01/19 12:06:29 [debug] 19620#0: *98 recvmsg: fd:14 n:1200
2021/01/19 12:06:29 [debug] 19620#0: *98 quic read handler
2021/01/19 12:06:29 [debug] 19620#0: *98 http3 idle handler
2021/01/19 12:06:29 [debug] 19620#0: *98 quic done reading
2021/01/19 12:06:29 [debug] 19620#0: *98 update posted event 000055C5E49479E0
2021/01/19 12:06:29 [debug] 19620#0: *98 delete posted event 000055C5E49479E0
2021/01/19 12:06:29 [debug] 19620#0: *98 quic write handler
2021/01/19 12:06:29 [debug] 19620#0: *98 quic alarm fired
2021/01/19 12:06:29 [debug] 19620#0: *98 quic done writing
2021/01/19 12:06:29 [debug] 19620#0: *98 event timer del: 14: 3060359
2021/01/19 12:06:29 [debug] 19620#0: *98 event timer add: 14: 38892:3060356
2021/01/19 12:06:45 [debug] 19620#0: *98 recvmsg: fd:14 n:1200
2021/01/19 12:06:45 [debug] 19620#0: *98 quic read handler
2021/01/19 12:06:45 [debug] 19620#0: *98 http3 idle handler
2021/01/19 12:06:45 [debug] 19620#0: *98 quic done reading
2021/01/19 12:06:45 [debug] 19620#0: *98 post event 000055C5E49479E0
2021/01/19 12:06:45 [debug] 19620#0: *98 recvmsg: fd:14 n:1200
2021/01/19 12:06:45 [debug] 19620#0: *98 quic read handler
2021/01/19 12:06:45 [debug] 19620#0: *98 http3 idle handler
2021/01/19 12:06:45 [debug] 19620#0: *98 quic done reading
2021/01/19 12:06:45 [debug] 19620#0: *98 update posted event 000055C5E49479E0
2021/01/19 12:06:45 [debug] 19620#0: *98 delete posted event 000055C5E49479E0
2021/01/19 12:06:45 [debug] 19620#0: *98 quic write handler
2021/01/19 12:06:45 [debug] 19620#0: *98 quic alarm fired
2021/01/19 12:06:45 [debug] 19620#0: *98 quic done writing
2021/01/19 12:06:45 [debug] 19620#0: *98 event timer del: 14: 3060356
2021/01/19 12:06:45 [debug] 19620#0: *98 event timer add: 14: 22908:3060359
2021/01/19 12:07:08 [debug] 19620#0: *98 event timer del: 14: 3060359
2021/01/19 12:07:08 [debug] 19620#0: *98 quic write handler
2021/01/19 12:07:08 [debug] 19620#0: *98 quic alarm fired
2021/01/19 12:07:08 [debug] 19620#0: *98 quic connection is closed
2021/01/19 12:07:08 [debug] 19620#0: *98 finalize quic connection: 14
2021/01/19 12:07:08 [debug] 19620#0: *98 http3 idle handler
2021/01/19 12:07:08 [debug] 19620#0: *98 http3 finalize connection
2021/01/19 12:07:08 [debug] 19620#0: *98 close http connection: 14
2021/01/19 12:07:08 [debug] 19620#0: *98 free quic connection
2021/01/19 12:07:08 [debug] 19620#0: *98 event timer del: 14: 90003000358
2021/01/19 12:07:08 [debug] 19620#0: *98 reusable connection: 0
2021/01/19 12:07:08 [debug] 19620#0: *98 run cleanup: 000055C5E46EE618
2021/01/19 12:07:08 [debug] 19620#0: *98 run cleanup: 000055C5E46EE588
2021/01/19 12:07:08 [debug] 19620#0: *98 free: 000055C5E4788220
2021/01/19 12:07:08 [debug] 19620#0: *98 free: 000055C5E46EE430, unused: 0
2021/01/19 12:07:08 [debug] 19620#0: *98 free: 000055C5E47C6790, unused: 0
2021/01/19 12:07:08 [debug] 19620#0: *98 free: 000055C5E47623F0, unused: 120

And after other connections cleanup waits for curl timeout then process other requests.

Strace from nginx process:

write(3, "2021/01/19 12:15:14 [debug] 19619#0: *3162 quic write handler\n", 62) = 62
write(3, "2021/01/19 12:15:14 [debug] 19619#0: *3162 quic alarm fired\n", 60) = 60
sendmsg(12, {msg_name={sa_family=AF_INET, sin_port=htons(56250), sin_addr=inet_addr("<ip>")}, msg_namelen=16, msg_iov=[{iov_base="B\200\317\346\367\236\200\275\370\277\321\322t\355b\216\17V.\230M\242\321\16\ri`\231dn\247\220\300\260\34\252\177W\"\371\327\302\r#\316P\270\363=\277\ff\325\336\244\220K\371\360\334\314$2\206\2238\36\330\220\265\212\346\370w\23S\363\310\22wRM\3237T_\225\256n\310\305`,\330\36\246\23\313\25{"..., iov_len=175}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 175
write(3, "2021/01/19 12:15:14 [debug] 19619#0: *3162 sendmsg: 175 of 175\n", 63) = 63
write(3, "2021/01/19 12:15:14 [debug] 19619#0: *3162 quic done writing\n", 61) = 61
write(3, "2021/01/19 12:15:14 [debug] 19619#0: *3162 event timer del: 12: 3606530\n", 72) = 72
write(3, "2021/01/19 12:15:14 [debug] 19619#0: *3162 event timer add: 12: 29:3546568\n", 75) = 75
epoll_wait(22, [{EPOLLIN, {u32=3834213856, u64=94308431132128}}], 512, 29) = 1
recvmsg(12, {msg_name={sa_family=AF_INET, sin_port=htons(56250), sin_addr=inet_addr("<ip>")}, msg_namelen=112->16, msg_iov=[{iov_base="I\360\336KP\203p\262\325\22\332C{\361,pm\331+\245\272\330\325\315u%\10\300\37\33\364\17\202\211\245P\314\324=u\257!\243", iov_len=65535}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 43
write(3, "2021/01/19 12:15:14 [debug] 19619#0: *3162 recvmsg: fd:12 n:43\n", 63) = 63
write(3, "2021/01/19 12:15:14 [debug] 19619#0: *3162 quic read handler\n", 61) = 61
write(3, "2021/01/19 12:15:14 [debug] 19619#0: *3162 http3 idle handler\n", 62) = 62
write(3, "2021/01/19 12:15:14 [debug] 19619#0: *3162 quic done reading\n", 61) = 61
write(3, "2021/01/19 12:15:14 [debug] 19619#0: *3162 post event 000055C5E4949180\n", 71) = 71
recvmsg(12, {msg_namelen=112}, 0)       = -1 EAGAIN (Resource temporarily unavailable)
write(3, "2021/01/19 12:15:14 [debug] 19619#0: *3162 delete posted event 000055C5E4949180\n", 80) = 80
write(3, "2021/01/19 12:15:14 [debug] 19619#0: *3162 quic write handler\n", 62) = 62
write(3, "2021/01/19 12:15:14 [debug] 19619#0: *3162 quic alarm fired\n", 60) = 60
write(3, "2021/01/19 12:15:14 [debug] 19619#0: *3162 quic done writing\n", 61) = 61
write(3, "2021/01/19 12:15:14 [debug] 19619#0: *3162 event timer del: 12: 3546568\n", 72) = 72
write(3, "2021/01/19 12:15:14 [debug] 19619#0: *3162 event timer add: 12: 59999:3606541\n", 78) = 78
epoll_wait(22, [{EPOLLIN|EPOLLOUT, {u32=3834215648, u64=94308431133920}}], 512, 5541) = 1

I would like to point out this (from other strace of nginx proccess):

12:56:05.431851 recvmsg(26, {msg_namelen=112}, 0) = -1 EAGAIN (Resource temporarily unavailable)
12:56:05.434142 recvmsg(26, {msg_namelen=112}, 0) = -1 EAGAIN (Resource temporarily unavailable)
12:56:05.435023 recvmsg(26, {msg_namelen=112}, 0) = -1 EAGAIN (Resource temporarily unavailable)
12:56:05.436736 recvmsg(26, {msg_namelen=112}, 0) = -1 EAGAIN (Resource temporarily unavailable)
12:56:05.499619 recvmsg(26, {msg_namelen=112}, 0) = -1 EAGAIN (Resource temporarily unavailable)

lsof -> nginx 32673 nobody 26u IPv4 253272330 0t0 UDP <hostname>:https

Parts of server configuration (http3 section): (changes in keepalive and http3 won't fix issue)

server {
        listen <ip>:443 ssl default_server http2;
        listen <ip>:443 quic reuseport;
        add_header alt-svc 'h3-29=":443"; ma=86400';

This type of flooding does not block nginx server (other connection work well). I have an idea of quiche socket getting close too early in some cases (but I'm not sure)

bachue commented 3 years ago

I can reproduce this issue, and it's definitely a bug. I tracked it and found if ngx_lookup_udp_connection() could find an existed udp_connection and reuse it, then this bug would happen.

junhochoi commented 3 years ago

Which curl version are you using? It fixed a similar issue weeks ago https://github.com/curl/curl/issues/6213

qjavax commented 3 years ago

Im using curl 7.74.0-DEV (x86_64-pc-linux-gnu) libcurl/7.74.0-DEV BoringSSL zlib/1.2.11 brotli/1.0.7 zstd/1.4.3 libidn2/2.2.0 libpsl/0.21.0 (+libidn2/2.2.0) nghttp2/1.39.2 quiche/0.6.0 Release-Date: [unreleased] Protocols: http https mqtt Features: alt-svc AsynchDNS brotli HTTP2 HTTP3 HTTPS-proxy IDN IPv6 Largefile libz NTLM NTLM_WB PSL SSL UnixSockets zstd As I mentioned while opening issue I belive it is not case of curl because this problem is not reproducable by curling to https://cloudflare-quic.com/ . This issue is also possible to reproduce with browser.

qjavax commented 3 years ago

@junhochoi Just to provide evidence it's not issue of curl here's screenshot of this issue replicated with Firefox. How you can achieve that: just refresh page several times to get request without any response

junhochoi commented 3 years ago

@qjavax we recently updated our nginx patch. Can you try with latest? Thanks.

qjavax commented 3 years ago

@junhochoi yes I did rebuild my nginx + quiche yesterday with commit: 1e439773dcb96c64071d6b85496bd33cc306aad7 . Unfortunately problem didn't change. I also tried your docker file with nginx+quiche

docker run -d -v $(pwd):/usr/local/nginx/html -p 127.0.0.1:8443:8443/udp -p 127.0.0.1:8443:8443/tcp -p 127.0.0.1:8080:8080/tcp -it ef6cc77f47b0
for x in {1..1000}; do echo " ===> $x"; curl3 -k -vvI --http3 https://127.0.0.1:8443; done

You can also replicate this issue like that.

qjavax commented 3 years ago

Hi, i got focused too much on nginx side of this issue. But after rebuilding nginx AND curl with newest quiche 0.7.0 this issue do not occur anymore so I'm closing this issue. Thank you for your time.