Closed arimitx closed 2 months ago
Maybe this issue is somewhat relevant with the following items:
Unlike a firewall which only needs to decide to block traffic or not (so it can block traffic at the second packet), a proxy needs to decide which outbound to route traffic to based on only the first packet (if protocol itself is 0-rtt). It will cause issues if a proxy "holds" the first packet and waits for the second (or even more) packet to arrive. So I don't think this is fixable.
Unlike a firewall which only needs to decide to block traffic or not (so it can block traffic at the second packet), a proxy needs to decided which outbound to route the packet to based on only the first packet (if protocol itself is 0-rtt). It will cause issues if a proxy "holds" the first packet and waits for the second packet to arrive. So I don't think this is fixable.
I think there might be some recent changes in chromium that actually modify the behavior of the browsers (e.g., chrome 124) when sending ClientHello in QUIC. From my perspective, the domain-based routing mechanism suddenly fails unexpectedly without any modification to the configurations or sing-box itself. Sadly, I think I am not the only one who suffers from it (even though no other user reports similar issues).
The part that really gets our hands sticky is that domain sniffing is an important feature for QUIC traffic. However, new RFCs make QUIC working like stream so that we can’t decide the routing policy simply using the first packet. Even though spreading ClientHello over multiple fragments in multiple packets helps avoiding protocol ossification, it does cause many problems for the “middle box” to handle the network traffic properly.
Sniffing itself is actually censorship. It makes sense for new standards to add some anti-censorship features (like making traffic more difficult to sniff). And how can a proxy holds UDP packets and waits a full SNI that may not even exist? The only way is to set timeout for waiting the second packet, if timeout and/or no SNI sniffed then route the first packet as is.
Sniffing itself is actually censorship. It makes sense for new standards to add some anti-censorship features (like making traffic more difficult for to sniff). And how can a proxy holds UDP packets and waits a full SNI that may not even exist? The only way is to set timeout for waiting the second packet, if timeout and/or no SNI sniffed then route the first packet as is.
I agree. From an end user perspective, it’s also a choice to simply block out all QUIC traffic on sing-box. The network performance may even get better if the connection quality for UDP is not good (common case).
From an end user perspective, it’s also a choice to simply block out all QUIC traffic on sing-box.
Fake DNS should still work as long as no encrypted DNS is used.
But if ECH is promoted one day, no domain name can be sniffed.
Fake DNS should still work as long as no encrypted DNS is used.
Thanks for the suggestion. Unfortunately, fake-ip is not an option in my case. I run sing-box on my router, and fake-ip would break the functionality of policy-based routing of other applications.
But if ESNI is promoted one day, no domain name can be sniffed.
Then let’s go back to the old good days when people were using browser plugins like SwitchyOmega for traffic splitting : )
Sorry to interrupt you guys. I think maybe daeuniverse/dae#301 can help us solve this problem.
https://github.com/SagerNet/sing-box/compare/dev-next...dyhkwong:sing-box:feature/fix-quic-sniffer This is a very preliminary PoC for testing purpose only. I don't know if bad things will happen.
dev-next...dyhkwong:sing-box:feature/fix-quic-sniffer This is a very preliminary PoC for testing purpose only. I don't know if bad things will happen.
Many thanks! I will try it later.
dev-next...dyhkwong:sing-box:feature/fix-quic-sniffer This is a very preliminary PoC for testing purpose only. I don't know if bad things will happen.
@dyhkwong Thanks again for your kind help!
I've created a fork of dev-next
branch of sing-box with your pull request merged. Then, I compiled and tested sing-box on a Windows 11 machine with the following minimum config:
{
"log": {
"level": "debug"
},
"dns": {
"servers": [
{
"tag": "google",
"address": "8.8.8.8",
"strategy": "prefer_ipv4"
}
],
"final": "google"
},
"inbounds": [
{
"type": "tun",
"tag": "tun-in",
"interface_name": "tun0",
"inet4_address": "172.19.0.1/30",
"mtu": 1280,
"gso": false,
"auto_route": true,
"strict_route": false,
"endpoint_independent_nat": false,
"udp_timeout": "5m",
"stack": "system",
"sniff": true
}
],
"outbounds": [
{
"type": "direct",
"tag": "direct-out"
},
{
"type": "dns",
"tag": "dns-out"
}
],
"route": {
"auto_detect_interface": true,
"rules": [
{
"protocol": "dns",
"outbound": "dns-out"
}
],
"final": "direct-out"
}
}
I can see from the logs that sing-box has successfully sniffed various domain names in quic sessions. For example:
INFO[0034] [4046528588 0ms] inbound/tun[tun-in]: inbound packet connection from 172.19.0.1:62020
INFO[0034] [4046528588 0ms] inbound/tun[tun-in]: inbound packet connection to 142.251.220.68:443
DEBUG[0034] [4046528588 0ms] router: sniffed packet protocol: quic
DEBUG[0034] [4046528588 0ms] router: sniffed packet protocol: quic, domain: www.google.com
INFO[0025] [3854301593 0ms] inbound/tun[tun-in]: inbound packet connection from 172.19.0.1:56775
INFO[0025] [3854301593 0ms] inbound/tun[tun-in]: inbound packet connection to 142.250.204.142:443
DEBUG[0025] [3854301593 0ms] router: sniffed packet protocol: quic
DEBUG[0025] [3854301593 1ms] router: sniffed packet protocol: quic, domain: www.youtube.com
INFO[0033] [3563755895 0ms] inbound/tun[tun-in]: inbound packet connection from 172.19.0.1:59726
INFO[0033] [3563755895 0ms] inbound/tun[tun-in]: inbound packet connection to 34.117.186.192:443
DEBUG[0033] [3563755895 0ms] router: sniffed packet protocol: quic
DEBUG[0033] [3563755895 0ms] router: sniffed packet protocol: quic, domain: ipinfo.io
INFO[0033] [3213668466 0ms] inbound/tun[tun-in]: inbound packet connection from 172.19.0.1:51200
INFO[0033] [3213668466 0ms] inbound/tun[tun-in]: inbound packet connection to 104.22.31.153:443
DEBUG[0033] [3213668466 0ms] router: sniffed packet protocol: quic
DEBUG[0033] [3213668466 0ms] router: sniffed packet protocol: quic, domain: myip.ipip.net
Maybe more tests are required to inspect the quic sniffing feature, but from my perspective it seems to work fine now.
dev-next...dyhkwong:sing-box:feature/fix-quic-sniffer This is a very preliminary PoC for testing purpose only. I don't know if bad things will happen.
@dyhkwong Thanks again for your kind help!
I've created a fork of
dev-next
branch of sing-box with your pull request merged. Then, I compiled and tested sing-box on a Windows 11 machine with the following minimum config:{ "log": { "level": "debug" }, "dns": { "servers": [ { "tag": "google", "address": "8.8.8.8", "strategy": "prefer_ipv4" } ], "final": "google" }, "inbounds": [ { "type": "tun", "tag": "tun-in", "interface_name": "tun0", "inet4_address": "172.19.0.1/30", "mtu": 1280, "gso": false, "auto_route": true, "strict_route": false, "endpoint_independent_nat": false, "udp_timeout": "5m", "stack": "system", "sniff": true } ], "outbounds": [ { "type": "direct", "tag": "direct-out" }, { "type": "dns", "tag": "dns-out" } ], "route": { "auto_detect_interface": true, "rules": [ { "protocol": "dns", "outbound": "dns-out" } ], "final": "direct-out" } }
I can see from the logs that sing-box has successfully sniffed various domain names in quic sessions. For example:
INFO[0034] [4046528588 0ms] inbound/tun[tun-in]: inbound packet connection from 172.19.0.1:62020 INFO[0034] [4046528588 0ms] inbound/tun[tun-in]: inbound packet connection to 142.251.220.68:443 DEBUG[0034] [4046528588 0ms] router: sniffed packet protocol: quic DEBUG[0034] [4046528588 0ms] router: sniffed packet protocol: quic, domain: www.google.com INFO[0025] [3854301593 0ms] inbound/tun[tun-in]: inbound packet connection from 172.19.0.1:56775 INFO[0025] [3854301593 0ms] inbound/tun[tun-in]: inbound packet connection to 142.250.204.142:443 DEBUG[0025] [3854301593 0ms] router: sniffed packet protocol: quic DEBUG[0025] [3854301593 1ms] router: sniffed packet protocol: quic, domain: www.youtube.com INFO[0033] [3563755895 0ms] inbound/tun[tun-in]: inbound packet connection from 172.19.0.1:59726 INFO[0033] [3563755895 0ms] inbound/tun[tun-in]: inbound packet connection to 34.117.186.192:443 DEBUG[0033] [3563755895 0ms] router: sniffed packet protocol: quic DEBUG[0033] [3563755895 0ms] router: sniffed packet protocol: quic, domain: ipinfo.io INFO[0033] [3213668466 0ms] inbound/tun[tun-in]: inbound packet connection from 172.19.0.1:51200 INFO[0033] [3213668466 0ms] inbound/tun[tun-in]: inbound packet connection to 104.22.31.153:443 DEBUG[0033] [3213668466 0ms] router: sniffed packet protocol: quic DEBUG[0033] [3213668466 0ms] router: sniffed packet protocol: quic, domain: myip.ipip.net
Maybe more tests are required to inspect the quic sniffing feature, but from my perspective it seems to work fine now.
Unfortunately, the application crashes when I add domain-based routing rules to the configuration.
The call stack information is attached here.
v1.10.0-alpha.23 should solve this.
Operating system
Others
System version
NA
Installation type
Others
If you are using a graphical client, please provide the version of the client.
NA
Version
Description
TL;DR
sing-box fails to parse the
server_name
correctly if the fragments ofCRYPTO
that containClientHello
are spread across multiple packets. (See RFC9312 Section-3.4.1-7)Bug Detail
Recently, I noticed that sing-box failed to sniff the domain names in QUIC traffic. The failure of the sniffing also leads to the failure of the subsequent domain-based routing.
In the corresponding logs, sing-box only sniffed QUIC protocol without the domain name:
After some study on the problem, I think the reason is that the existing sniffing mechanism for QUIC traffic fails to handle the case where the fragments of
CRYPTO
that containClientHello
are spread across multiple packets.Problem Reproduction
In my case, the fragments of
CRYPTO
that containClientHello
are spread across two packets (The original pcapng file can be downloaded here):(1 fragment in the first packet)
(4 fragments in the second packet)
To find out what happened, I extracted the payloads from two packets and wrote additional test functions regarding
TestSniffQUICFragment
incommon/sniff/quic_test.go
:Packet 1 Payload
ce0000000108f65caf297b7fdf2600404600d6b901e3b8cab2485ecf3fa3b25b6037d89673312e8835618c60a1d0729eb30c4c15d0ba53d1c520d7bf42c8c7394420317c33eb2950a3a867ec59e99aed8fe4186b14be1b44890247a5da8562947b11989eed198a380ecfd7fac8932a6728384395d362a3408f79bb89eba84ee73d75a965f1862ec67d89290e54ba22a03114b6739bd10f12dc4f24f7f66f33e76a06946a3f01e3c87646e6bdd6e9c396c9d4fa5814d9a41e79ce752bac41e9888f69f380188fd49b0d64624700c84fcdf4d91616cd2fd17bf40b37942bc692d270fb712d457626bc418e1eec88f610447516853f5646241ad119c4b5920f290e6644cd79245c0e52b54ac081b6a131eb46890fbdffbd937981fc266ece92511ae1ef2c4c03cc4a92828182498b4fc653fd8eb4aa6b142fcb23c0911e0ad49275c6b405023870d379f7d2edccc6ffd801927572a5798cc974289fadda5d1eadf57feac392bc1b5cfb6abe09c63fed96ebfa3b5183251a3d06574ef9c20898a52df23e0b85943ef6f2498b16ca237a7e387f222aa39373557a08b2acff35922a4b852ac12ce63be507df3dc7a4696d701cf85407f80627e082fccdb144c4ad702817e70e4f57deb46d78c3ae49b3b28455ae6fe0b9f204622a9e450aaa44b3122890f7b26b7833ba5576e178dd62d55d94e7328f60205a9230ac8bbd251dab29c156ea05535bec1ec24c95d2983fe529c0b1db16430e0650ea874ac85b5fafafada3f0640c501afcd120459a9434799cf9e3be64d97baa4207fac5a559ec54f721c14a2872c4a5438102399481caf3b537feb9daadaf3a5f9822471e871bae7b2dcd6cbaa006150182eeb18c8130585442e66bfe8f62b811ba5a5b0f855250199b6f0d74bf1ba275b5a2a51abe6d7456bc6ec317197ff597afcbb9881a76d4220bffca6001ef90a28e48b93aefcc4409fe85c4fb476d741f0dca24981de86e6ce0002eb2aceec7876016b6b0484984a05f8e05e20669858226e2030f475d099d1b6ecc2cb1a89764cd7ae1e5a4834ab259de5c2af8d91f512cb0035e3e5c696baab46686ab985be4c3f765b1482a75fe8b66bdbdcf6e4e8b82fb099889b79410a21de44a70967d7858e2faffba7b765a856a8a7be14915c6f8270736e1412d6ab4f177cb89fec3087bf0a576c80068d114ba1d42549d6a1c08de498127ebbff8f5ca721bba275650fb7f71bd4dde2d1d712c4cd3c6f9a3d09892a2edcebfab5d8d0ba0e16fd0c6ca4f686d5f65b3061c7d5d269ecf297e9a6c645fa6ace6faca68ad8372e0591da7e31bdb79f68041b590b5f844a73d2c12de456fc19e417a46571997710dcc27e533a5b7e00b3d7083aadcfabf1f9b8602f83569f6f021b2b199a83a885c9b6cfd2c2bbf24d107a2154842a6e7fef8c7fc4e6c2ff1f83a6ac186ce374574072bbc723da3807097eda508190a7c8208c71111f43980b4923b51d6229ac290729e959093dfeeb4f38e9ae0216ccf74a0071003288479dc87a6a9c44742714868c43181f3373e8c19b0586f635caa2ce64628f2b90d3620c950ed14be118ec87f4c470c60be0cff97555be3e845a7f12b91071c1d2cf09c12c34c190ac8f7c4359b8c22590883b44fad0f23bacda4fe31107a72f8cfb1184c5c312331b03f2a3a6e6ac44cba188ab1cab6513c362e1436d2d19337344832344e577cc1710754ba3cb1004106bd958df0ef53f9103745a7774895891b2da5fb90693096a3Packet 2 Payload
c30000000108f65caf297b7fdf2600404600d6b901e3b8cab2485ecf3fa3b25b6037d89673312e8835618c60a1d0729eb30c4c15d0ba53d1c520d7bf42c8c7394420317c33eb2950a3a867ec59e99aed8fe4186b14be1b4489e2b4ec270295367801d969e0f806b612a13b34edf8412c6c056d993c33e5d04a97b70a6243edad3e5a0948b88a66be38110145c7ab69af25298c1886bd68ba03fe476ba2fda0b3d42e8a93827dbaf064226403041fb1ab2ac328b160114f2a99ccc216152e1393339f2e96ffa615316dd85c1626928b55ea09b8bf4b64dcf6573cecd6fd04645a2d61c55f913796bd165fa60dd9742611d2cee37c712667efece55ff1135a428837b2fc3657b6e5c49f5adeddcdd1c0829a3165728953bfdd028872ed814d80d0825f9a8e7fd73aa90e8584869376052ed51efcd4d112d99c4c30362da4cd0f68cdedf6003646b409eac756c8be55a81edb0dca6d1e875a41cc9033ae00eb43345fc5020731108871234234fe8ab2dff2264744c7b2e2e991031321b70c6165eba736455b142755a8612f76a88d6fce6c090418709c350e3ca1ac536707526e171ea1a6a69daa6b887d589fa9f08f26dfa99d90c9ca8427e66acc94c36e8c08cc252062efaf4b6e1f5aadde68649329675873b145260fd961bef66a64733dc914c1d90501474a0d061ef8c09372f57bcaca9db6493308c4555c569c62ee30c6181fd3736c1c19bcf4d6c5e6b21e1d869c765de668d3c7f5e4b5c04a31c696907b5f459bdbd2666c33b1bba01fdfa1ef6418348c23201b7ad0a97ead75577c09963691a55357b8587523f40355f30669ce95ad2d24637fcbf9fb02e3e4134c5ed8eefb6b2d2e456bfc42a4be697af4b50c4719d55c09029ab284bc8be098bab030c1cfea142e506fb566dcc525a3c72b8cb5b561fec1ca9fe9e78318c38ad18e3b105423dd8330f9d5962cf169c7f12b8ac1ffeca5be1df5071423e17cca0e1b3bef3c422ee91bc3aefc8ce840b0d8e4c478806dd985a17618982490d7087248b7a82ebd84aa97074cfeb21f88a3204797c465eb74f0efae3c7be990d667dedf5f2b9ebc82fe9690b261d512f0ce6dd09bc9118b484526de52ac55cf715f1a2f426b3daaa8d00dd7adb534ba22860e80544b7d050f616ce8b2b14929c2559b82404de9fb13c8b5f53a0c6e560a7d346e7d1db096400f61dc7bbbb19d67c1e581ef8529d6fc29eb8586b777348ba220692ff6be90aabf3ad6831cd9b4ce6d5de6992dedc7743dbd28605e186c34f2db6407b9f3811ab7f5fe3964f58418357183e073a7cbab7ab2bdb3fe99fadadb43a2448618150f5f415e8d61b00220949093b9ff65ec694836183239fe3d932eae16984e5b508ba213882b49b5f8e152541b0f9d88d1c00d872893783cee0f53827d894202345237adf635335095c9d1d650b3e267fe5b581f687423060387dd597aa10edc51e42e330010af244416e6659a044e3f00aa700e1a2115760b19deced9fed8d773334a0bfb32c406683b698b4aa536c0947df4b35f19430929e8a115eba92f8f797ca7e37e0c902f77c1fbdc80ad8fc6df411dcb2e28d331ef53be038796f4fd0def0d686d581d995fc6baf420492552eb11f5a191d429175d6f0fad7a0d13fdc913bfd2ce99f8b51f69c009e2a01742f677c3f6e296e34f55dda559d6024f68bceb765fc4a8a1e271574c64d2495ab3840f37cbb0ab8325c6578530d8eea51cf8704306774f92331f7d1085e7d45f30For Payload 1, there is an
EOF
error (quic.go
line 299):For Payload 2, there is a
bad fragments
error (quic.go
line 295):Conclusion
Obviously, the current sniffing mechanism for QUIC traffic is insufficient to cope with the case where the
CRYPTO
fragments containingClientHello
are spread across multiple packets.A temporary measure could be adding an extra rule to block all QUIC traffic, and I am also investigating the codes to find a possible solution. However, I am not familiar with either Go or the code base of sing-box. I would appreciate it if you could fix the problem or provide me with some suggestions.
Reproduction
The bug can be reproduced using the payloads in the bug description section.
Logs
Supporter
Integrity requirements