SagerNet / sing-box

The universal proxy platform
https://sing-box.sagernet.org/
Other
20.34k stars 2.43k forks source link

singbox 发送了错误的 maxDatagramFrameSize,导致 hysteria2 服务端错误的进行分包 #1650

Closed Zxneric closed 1 month ago

Zxneric commented 7 months ago

操作系统

Windows

系统版本

10.0.22631 Build 22631

安装类型

sing-box 原始命令行程序

如果您使用图形客户端程序,请提供该程序版本。

No response

版本

sing-box version 1.9.0-rc.4

Environment: go1.22.1 windows/amd64
Tags: with_gvisor,with_quic,with_dhcp,with_wireguard,with_ech,with_utls,with_reality_server,with_acme,with_clash_api
Revision: dce6053c3f37e2e578de4af106f3db0edf24e73d
CGO: disabled

描述

使用 singbox 连接到 hysteria2 outbound 时,singbox 内部无法正确对 udp 进行分包,udp 包大小超过 quic datagram 容量,导致数据丢失。

此错误可能由

https://github.com/SagerNet/sing-quic/blob/6be1f3c03ae882d544677f1cc0f43297e4afa6b1/hysteria2/packet.go#L239-L253

导致

重现方式

按照以下配置搭建服务以后

curl --http3-only -vvv https://quic.nginx.org/test/

输出

* Host quic.nginx.org:443 was resolved.
* IPv6: (none)
* IPv4: 35.214.218.230
*   Trying 35.214.218.230:443...
* QUIC cipher selection: TLS_AES_128_GCM_SHA256:TLS_AES_256_GCM_SHA384:TLS_CHACHA20_POLY1305_SHA256:TLS_AES_128_CCM_SHA256
*  CAfile: curl-ca-bundle.crt
*  CApath: none
* ngtcp2_conn_handle_expiry returned error: ERR_HANDSHAKE_TIMEOUT
* Failed to connect to quic.nginx.org port 443 after 10045 ms: Failed sending data to the peer
* Closing connection
curl: (55) ngtcp2_conn_handle_expiry returned error: ERR_HANDSHAKE_TIMEOUT

sing-box client 配置文件

{
  "dns": {
    "servers": [
      {
        "tag": "google",
        "detour": "select",
        "address": "tls://8.8.8.8"
      },
      {
        "tag": "local",
        "address": "114.114.114.114",
        "detour": "direct"
      }
    ],
    "rules": [
      {
        "outbound": "any",
        "server": "local"
      },
    ],
    "independent_cache": true,
    "final": "google",
    "strategy": "ipv4_only"
  },
  "inbounds": [
    {
      "auto_route": true,
      "interface_name": "singbox",
      "inet4_address": "172.19.0.1/30",
      "type": "tun",
      "stack": "gvisor",
      "endpoint_independent_nat": true,
      "mtu": 1400,
      "strict_route": true,
      "sniff": true,
      "sniff_timeout": "500ms",
      "udp_disable_domain_unmapping": true
    }
  ],
  "outbounds": [
    // any hysteria2 outbound
  ],
  "route": {
    "auto_detect_interface": true,
    "final": "select"
  }
}

hysteria 服务端配置文件

listen: :443

tls:
  cert: /etc/hysteria/cert.crt
  key: /etc/hysteria/private.key

quic:
  initStreamReceiveWindow: 16777216
  maxStreamReceiveWindow: 16777216
  initConnReceiveWindow: 33554432
  maxConnReceiveWindow: 33554432

auth:
  type: password
  password: 

masquerade:
  type: proxy
  proxy:
    url: https://example.com
    rewriteHost: true

日志

DEBUG[0184] [764637597 0ms] inbound/tun[0]: connection closed: upload: DATAGRAM frame too large (maximum: 1200 bytes) | download: io: read/write on closed pipe
DEBUG[0187] [1280231848 10.2s] inbound/tun[0]: connection closed: io: read/write on closed pipe | upstream: context canceled
DEBUG[0194] [2325380955 16.55s] inbound/tun[0]: connection closed: download: use of closed network connection

支持我们

完整性要求

Zxneric commented 7 months ago

有两个主要错误

第一个是

https://github.com/SagerNet/sing-quic/blob/619122b4c233d46dd1e2c3262df303e8a9adf6d1/hysteria2/packet.go#L87

fragUDPMessage 的 maxPacketSize 只传入过 udpMtu 或者 quic-go 抛出的 max frame size,因此比较时,应该减去 headerSize

第二个是

https://github.com/SagerNet/sing-quic/blob/619122b4c233d46dd1e2c3262df303e8a9adf6d1/hysteria2/packet.go#L210

tooLargeErr.PeerMaxDatagramFrameSize 不能正确反映可用的空间,正确的值应该是 *DatagramFrame.MaxDataLen 返回的值。在当前版本下,可以直接 -3

Zxneric commented 7 months ago

应用了以上两个修复后,确认hysteria2下的udp工作正常

Zxneric commented 7 months ago

关于第二点,hysteria 官方实现选择在fork中暴露内部的可用 buf 大小

https://github.com/apernet/quic-go/blob/fef64abc6b78d476489718256f19e3d871f0707f/connection.go#L2368-L2373

Zxneric commented 7 months ago

https://github.com/SagerNet/sing-quic/blob/6be1f3c03ae882d544677f1cc0f43297e4afa6b1/hysteria2/packet.go#L364-L370

如果有两个分段数相同,会导致缓存污染

nekohasekai commented 7 months ago

https://github.com/SagerNet/sing-quic/blob/6be1f3c03ae882d544677f1cc0f43297e4afa6b1/hysteria2/packet.go#L364-L370

如果有两个分段数相同,会导致缓存污染

udpDefragger is unique by sessionID

nekohasekai commented 7 months ago

关于第二点,hysteria 官方实现选择在fork中暴露内部的可用 buf 大小

https://github.com/apernet/quic-go/blob/fef64abc6b78d476489718256f19e3d871f0707f/connection.go#L2368-L2373

This seems to come from quic-go rather than hysteria, which is a fixed value.

Zxneric commented 7 months ago

关于第二点,hysteria 官方实现选择在fork中暴露内部的可用 buf 大小 https://github.com/apernet/quic-go/blob/fef64abc6b78d476489718256f19e3d871f0707f/connection.go#L2368-L2373

This seems to come from quic-go rather than hysteria, which is a fixed value.

it did came from quic-go, thus need a modification in the fork, or just minus 3 for the current version

Zxneric commented 7 months ago

something still confused me in the hysteria2 outbound is dtls hanshake cannot be finished. Client Hello -> Hello Verify -> Client Hello, then the Server Hello always be dropped. It didn't happened in the official hysteria implementation.

nekohasekai commented 7 months ago

it did came from quic-go, thus need a modification in the fork

hysteria fixes the maximum mtu of quic-go to 1200, so this judgment is needed, while sing-box and clash.meta cannot

Zxneric commented 7 months ago

After apply the patch in SagerNet/sing-quic#8, I confirmed the udp works corretly if the packet length less than 1390. Still needs the further work on this.

server.py

import socket
import argparse

def main(server_ip, server_port):
    with socket.socket(socket.AF_INET, socket.SOCK_DGRAM) as server_socket:
        server_socket.bind((server_ip, server_port))
        print(f"Server is listening on {server_ip}:{server_port}")

        while True:
            data, client_address = server_socket.recvfrom(10000)
            print(f"Received data from {client_address}: {len(data)}")

            server_socket.sendto(data, client_address)
            print(f"Sent data back to {client_address}: {len(data)}")

if __name__ == "__main__":
    parser = argparse.ArgumentParser(description="UDP server with command line arguments")
    parser.add_argument("--server_ip", default="0.0.0.0", help="Server IP address")
    parser.add_argument("--port", default=45793, type=int, help="Server port number")
    args = parser.parse_args()

    main(args.server_ip, args.port)

client.py

import os
import socket

server = None
port = None

def send_and_receive(n):
    s = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
    s.connect((server, port))
    s.settimeout(1)

    data = os.urandom(n)

    s.sendall(data)

    rdata = s.recv(n)

    assert data == rdata
    return data

if __name__ == "__main__":
    for i in range(1190, 3000):
        print("sending and receiving", i, "bytes")
        send_and_receive(i)
        print(i, "bytes received")
Zxneric commented 7 months ago

SagerNet/sing-quic#8 is wrong and not related to the real bug

Zxneric commented 7 months ago

SagerNet/sing-quic#9 fixes this. But the same bug may also happened to hysteria1 and tuic. I have no idea about those protocol. Please also update them. @nekohasekai @dyhkwong plz review

Zxneric commented 7 months ago

And it seems that mtu discovery never worked properly

Zxneric commented 7 months ago

@nekohasekai 此错误在 1.9.0-rc11 中被修复,但是在 rc12 中再次出现 force push sing-quic 时,丢弃了 https://github.com/SagerNet/sing-quic/pull/9/commits/6a7930915e8150bbe732d85fe9cb8a1cd174017a 中的补丁。 设置此参数可以保证服务端工作正常,尤其是与其他实现的服务端互操作时,仅仅设置 udpMtu 只能保证客户端行为

Zxneric commented 7 months ago

@nekohasekai I would like to know how you plan to fix this issue? Are there any planned patches? Please let me know if you have any comments on my fix.

github-actions[bot] commented 5 months ago

This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 5 days

Zxneric commented 5 months ago

此问题在最新版本仍然存在

dyhkwong commented 5 months ago

https://github.com/SagerNet/sing-quic/blob/fe8564f7e0b2b0d184aea0368fbdba8a83781b89/hysteria2/packet.go#L138

udpMTU 已经是 1200 - 3

fragUDPMessage 的 maxPacketSize 只传入过 udpMtu 或者 quic-go 抛出的 max frame size,因此比较时,应该减去 headerSize

https://github.com/SagerNet/sing-quic/blob/fe8564f7e0b2b0d184aea0368fbdba8a83781b89/hysteria2/packet.go#L87

tooLargeErr.PeerMaxDatagramFrameSize 不能正确反映可用的空间,正确的值应该是 *DatagramFrame.MaxDataLen 返回的值。在当前版本下,可以直接 -3

https://github.com/SagerNet/sing-quic/blob/fe8564f7e0b2b0d184aea0368fbdba8a83781b89/hysteria2/packet.go#L206

以及:https://github.com/SagerNet/sing-box/issues/1650#issuecomment-2041857226

此问题在最新版本仍然存在

所以现在的问题是什么呢?

Zxneric commented 5 months ago

问题出在 https://github.com/SagerNet/quic-go/blob/cbbf722db9215d5ebf2f53ed43c804e71a71346c/connection.go#L316-L320, 如果不设置 config.MaxDatagramFrameSize, 会导致连接的 MaxDatagramFrameSize 被默认设置为16384,在这种情况下,对端如果发送了 加上hysteria header以后的包大小 大于 路径MTU 的包,会导致直接被丢弃。 如果服务端也是sing-box的话是没有这个问题的,因为sing-box在发送之前写死了检查大小的逻辑,但是如果和官方实现通信的话这里就bug了。

Zxneric commented 5 months ago

MaxDatagramFrameSize 在原本的 quic-go 中是不存在的,是被一系列patch加上的。 hysteria官方实现没有设置这个值是因为 wire.MaxDatagramSize 被改成了 1200,所以不会有问题。但是 sing-box 回滚了这个patch,因此需要手动设置这个值来通知对端正确的大小。 https://github.com/SagerNet/sing-quic/pull/10 我之前提的 PR 被拒了

nekohasekai commented 5 months ago

在这种情况下,对端如果发送了 加上hysteria header以后的包大小 大于 路径MTU 的包,会导致直接被丢弃。

请重新组织语言描述你的问题,hy2 自己限制 1200,不可能发送大于此的包。

如果你只知道存在问题与掩盖问题的方法,请不要添油加醋。

Zxneric commented 5 months ago

1200是默认的设置,这个数值会在运行时被改变,我在hysteria的官方实现打log确认的。应该是和路径发现有关。 我确实没有弄清楚完整的整个逻辑链,但是这里肯定有个bug是确定的

github-actions[bot] commented 2 months ago

This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 5 days

Zxneric commented 2 months ago

update