Closed jschwinger233 closed 3 months ago
测试通过
正常工作
It's working fine
❌ Your branch is currently out-of-sync to main. No worry, I will fix it for you.
❌ Your branch is currently out-of-sync to main. No worry, I will fix it for you.
Tested in the following environment, works very well.
A router: Linux ImmortalWrt 6.1.78 #0 SMP PREEMPT Mon Feb 19 15:48:41 2024 aarch64 GNU/Linux
A workstation: Linux Manjaro 6.7.7-1-MANJARO #1 SMP PREEMPT_DYNAMIC Fri Mar 1 18:26:06 UTC 2024 x86_64 GNU/Linux
Thank all folks who keep testing this PR, https://github.com/daeuniverse/dae/pull/466/commits/5badabfc8a21d5f2accc49329e8e8da58d415049 is the last low-hanging fruit whose temptation I can't resist. Hope this small patch doesn't break anything :crossed_fingers:
The lpc2020 had a talk introducing this bpf_redirect_peer
which allows ingress to ingress redirection without going through CPU's backlog queue. Cilium sees +1.3Gbit/sec perf boost by using it.
After binding docker0 to the LAN and testing https://github.com/daeuniverse/dae/commit/5badabfc8a21d5f2accc49329e8e8da58d415049, everything works perfectly. There are no issues with direct connection diversion. Well done.
A workstation: Linux Manjaro 6.7.7-1-MANJARO #1 SMP PREEMPT_DYNAMIC Fri Mar 1 18:26:06 UTC 2024 x86_64 GNU/Linux
使用最新 CI build在以下环境测试成功:
Linux GracPC 6.7.5-zen1-1-zen #1 ZEN SMP PREEMPT_DYNAMIC Sat, 17 Feb 2024 14:02:21 +0000 x86_64 GNU/Linux
Linux NAS 6.7.4-arch1-1 #1 SMP PREEMPT_DYNAMIC Mon, 05 Feb 2024 22:07:49 +0000 x86_64 GNU/Linux
Run two docker containers, one has dae inside, the other has v2ray. It's almost the same as dae's github action test: just see two containers as two nodes.
I am using sockperf:
Run sockperf server on the v2ray side: (for UDP test, delete --tcp
)
nsenter -t $(pidof v2ray) -n sockperf server -i 172.18.0.3 --tcp --daemonize
Run sockperf client inside the "pod" to emulate lan proxy: (for UDP test, delete --tcp
)
nsenter -t $(pidof pod) -n sockperf ping-pong -i 172.18.0.3 --tcp --time 10
dae-0.4.0: avg-latency=37.310 (std-dev=7.352) this pr: avg-latency=36.792 (std-dev=7.437)
avg-latency improves by 1.3%.
This seems not too much, because the testing environment is clean and free from netfilter.
After adding a simple iptables rule on the dae node:
iptables -t raw -A PREROUTING -p tcp -m tcp --dport 11111 -j ACCEPT
dae-0.4.0 will perform worse, sometimes avg-latency could go as high as 38+, while dae-next (this pr) won't be affected at all because of stack bypass implementation. In the case, it's 3.1% improvement.
The normal UDP test result is:
dae-0.4.0: avg-latency=58.275 (std-dev=50.721) dae-next: avg-latency=55.927 (std-dev=48.332)
4% boost.
However, it is also known that dae-0.4.0 uses encapsulation to avoid port conflict if there is a process already listening on 53, which damages performance badly. When that fallback takes place, dae-0.4.0's avg-latency will drop to 60.412 (std-dev=47.764), and dae-next has 7%+ better result.
Background
这个 PR 引入了三项针对 lan 的性能优化。先回顾 datapath:
优化 1:a 和 b 处的 bpf 程序都解析了一遍二三四层的包头,其实没有必要解析两次,在 a 出解析完了之后可以通过 skb->cb 把 b 处需要知道的信息夹带过去。 优化 2:b 处的 peer_ingress bpf 没有必要对 established tcp 调用 bpf_skc_lookup 查询 socket,因为内核本身就可以完成 socket lookup。在开启 tcp_early_demux 的情况下还可以避免路由决策直接做 local delivery。 优化 3:a 处的 lan_ingress 可以调用 bpf_redirect_peer 直接重定向给 netns 内部的 peer,避免 enqueue_to_backlog 造成的性能影响。
Background
This PR introduces 3 performance optimizations. First, let's review the datapath:
Optimization 1: Both the BPF programs at points a and b have parsed the packet headers up to layers two, three, and four. It's unnecessary to parse them twice. After parsing at point a, the information needed at point b can be passed using skb->cb.
Optimization 2: The peer_ingress BPF at point b doesn't need to perform socket lookup for established TCP connections using bpf_skc_lookup because the kernel itself can handle socket lookup. With tcp_early_demux enabled, it can also avoid routing decisions and perform local delivery directly.
Optimization 3: The lan_ingerss at point a redirects the skb from wan0 to dae0, which then goes through netns to reach the peer. This step can be simplified using bpf_redirect_peer: redirect the skb directly from lan0 to the peer inside the netns, avoiding performance impact from enqueue_to_backlog.
Recommendation: Review by commit.
Checklist
Full Changelogs
Issue Reference
Closes #[issue number]
Test Result