caddyserver / caddy

Fast and extensible multi-platform HTTP/1-2-3 web server with automatic HTTPS
https://caddyserver.com
Apache License 2.0
57.79k stars 4.02k forks source link

HTTP/3 handshake timeout over IPv4 but not IPv6 #6073

Closed thijsvandien closed 6 months ago

thijsvandien commented 8 months ago

1. Environment

1a. Operating system and version

FreeBSD 14.0-RELEASE-p4 amd64

1b. Caddy version

e1b9a9d7b08f6f0c21feb8edf122585891aa7099
v2.7.6 h1:w0NymbG2m9PcvKWsrXO6EEkY9Ru4FJK8uQbYcev1p3A=
v2.6.0 h1:lHDynvM+sTOi9Aq4Y15b4FtkqzPB36WbUrZvVdwzTCA=

2. Description

2a. What happens

Using v2.7.6 or e1b9a9d I can't connect with curl --http3-only --ipv4 (ERR_HANDSHAKE_TIMEOUT), whereas --ipv6 works as expected. HTTP/3 Check similarly works over IPv6, but not IPv4.

When I try v2.6.0, IPv4 starts working for both. Switching back, the results are as they were before.

Running tcpdump on the server, with no firewalls active, I see QUIC Initial packets but nothing more. There are entries in the console output indicating that they do reach Caddy.

2b. Why it's a bug

2c. Log output

In case of failure, repetitions (each with a unique id) of:

2024/01/30 05:10:15.515 DEBUG   events  event   {"name": "tls_get_certificate", "id": "bdc8f332-bd5b-4618-b020-1b4ef0329ee2", "origin": "tls", "data": {"client_hello":{"CipherSuites":[4865,4866,4867,4868,255],"ServerName":"example.com","SupportedCurves":[23,29,24,25],"SupportedPoints":"AAEC","SignatureSchemes":[1027,1283,1539,2055,2056,2057,2058,2059,2052,2053,2054,1025,1281,1537],"SupportedProtos":["h3","h3-29"],"SupportedVersions":[772],"RemoteAddr":{"IP":"<client_ipv4>","Port":57654,"Zone":""},"LocalAddr":{"IP":"<server_ipv4>","Port":443,"Zone":""}}}}
2024/01/30 05:10:15.515 DEBUG   tls.handshake   choosing certificate    {"identifier": "example.com", "num_choices": 1}
2024/01/30 05:10:15.515 DEBUG   tls.handshake   default certificate selection results   {"identifier": "example.com", "subjects": ["example.com"], "managed": true, "issuer_key": "acme-v02.api.letsencrypt.org-directory", "hash": "6a9f361921bc7399b6d327dd0377dab3ba549c30c057384bcbcf4403ad326ef2"}
2024/01/30 05:10:15.515 DEBUG   tls.handshake   matched certificate in cache    {"remote_ip": "<client_ipv4>", "remote_port": "57654", "subjects": ["example.com"], "managed": true, "expiration": "2024/04/28 02:54:34.000", "hash": "6a9f361921bc7399b6d327dd0377dab3ba549c30c057384bcbcf4403ad326ef2"}

In case of success:

2024/01/30 05:10:50.107 DEBUG   events  event   {"name": "tls_get_certificate", "id": "e7b9a95f-b3ab-4c2b-82b3-72dbb1aca88e", "origin": "tls", "data": {"client_hello":{"CipherSuites":[4865,4866,4867,4868,255],"ServerName":"example.com","SupportedCurves":[23,29,24,25],"SupportedPoints":"AAEC","SignatureSchemes":[1027,1283,1539,2055,2056,2057,2058,2059,2052,2053,2054,1025,1281,1537],"SupportedProtos":["h3","h3-29"],"SupportedVersions":[772],"RemoteAddr":{"IP":"<client_ipv6>","Port":52258,"Zone":""},"LocalAddr":{"IP":"<server_ipv6>","Port":443,"Zone":""}}}}
2024/01/30 05:10:50.107 DEBUG   tls.handshake   choosing certificate    {"identifier": "example.com", "num_choices": 1}
2024/01/30 05:10:50.107 DEBUG   tls.handshake   default certificate selection results   {"identifier": "example.com", "subjects": ["example.com"], "managed": true, "issuer_key": "acme-v02.api.letsencrypt.org-directory", "hash": "6a9f361921bc7399b6d327dd0377dab3ba549c30c057384bcbcf4403ad326ef2"}
2024/01/30 05:10:50.107 DEBUG   tls.handshake   matched certificate in cache    {"remote_ip": "<client_ipv6>", "remote_port": "52258", "subjects": ["example.com"], "managed": true, "expiration": "2024/04/28 02:54:34.000", "hash": "6a9f361921bc7399b6d327dd0377dab3ba549c30c057384bcbcf4403ad326ef2"}
2024/01/30 05:10:50.125 DEBUG   http.handlers.reverse_proxy selected upstream   {"dial": "localhost:8889", "total_upstreams": 1}
2024/01/30 05:10:50.128 DEBUG   http.handlers.reverse_proxy upstream roundtrip  {"upstream": "localhost:8889", "duration": 0.002274807, "request": {"remote_ip": "<client_ipv6>", "remote_port": "52258", "client_ip": "<client_ipv6>", "proto": "HTTP/3.0", "method": "HEAD", "host": "example.com", "uri": "/", "headers": {"User-Agent": ["curl/8.6.0-DEV"], "Accept": ["*/*"], "X-Forwarded-For": ["<client_ipv6>"], "X-Forwarded-Proto": ["https"], "X-Forwarded-Host": ["example.com"]}, "tls": {"resumed": false, "version": 772, "cipher_suite": 4865, "proto": "h3", "server_name": "example.com"}}, "headers": {"Server": ["Backend"], "Content-Type": ["text/html; charset=UTF-8"], "Date": ["Mon, 30 Jan 2024 05:10:50 GMT"], "Content-Length": ["87"]}, "status": 405}
thijsvandien commented 8 months ago

Update: I bisected it down to commit 710824c3ce9f8084517e8ab099d57f9060f62061.

francislavoie commented 8 months ago

/cc @WeidiDeng

mholt commented 8 months ago

Thanks for narrowing that down!

thijsvandien commented 8 months ago

Well, thanks for the easy build process. 👌

thijsvandien commented 8 months ago

I was able to replicate this on a clean Parallels VM (now aarch64). Here are two interesting finds:

Hence my workaround in production is adding default_bind 127.0.0.1 [::1] <server_ipv4> <server_ipv6>.

thijsvandien commented 8 months ago

Also confirmed on FreeBSD 13.2, so we can't blame it on 14.0 (which is fairly new).

WeidiDeng commented 8 months ago

Actually, that patch is no longer present in the latest version (udp sockets are reused using SO_REUSEADDR on unix).

That patch does enable more aggressive optimization from quic-go. I guess there is a bug from there.

Can you try using quic-go directly? Try passing a *net.UDPConn directly and wrapping it as a generic net.PacketConn without any more interfaces. I don't own a FreeBSD machine, so I can't debug it furthur.

thijsvandien commented 8 months ago

Since I have zero experience with either Go or (implementing) QUIC, building a custom server isn't a trivial task. Perhaps their example requires minimal changes to test something.

On the other hand, if you need access to a FreeBSD box, that would be less of a problem to provide – it's just a question of where to send the credentials...? It doesn't need a dedicated box however; it's easy to run in a VM.

WeidiDeng commented 8 months ago

The problem is, I don't have access to VMs right now. You can send a temporary credentials to my email, or running ttyd with -t enableTrzsz=true with a limited privileged user so I can upload test quic-go files if you don't mind.

thijsvandien commented 8 months ago

OK, I sent you an email to work out a testing environment.

mholt commented 8 months ago

You two are awesome -- thank you for looking into this :pray: :blush:

thijsvandien commented 8 months ago

I am now experiencing an unexpected consequence of my workaround. Connections to port 80 are refused rather than answered with HTTP 308. The same happens for both IPv4 and IPv6. Is this a misunderstanding of default_bind on my part, or should this be considered a separate issue?

nixigaj commented 8 months ago

Just adding that I'm experiencing this issue as well.

thijsvandien commented 6 months ago

Fixed by https://github.com/caddyserver/caddy/pull/6176.