daeuniverse / dae

eBPF-based Linux high-performance transparent proxy solution.
GNU Affero General Public License v3.0
2.62k stars 165 forks source link

optimize(bpf): Use direct packet access #562

Open jschwinger233 opened 5 days ago

jschwinger233 commented 5 days ago

Background

之前使用 bpf_skb_load_bytes 从 skb 读取三四层包头,这个 PR 使用了效率更高的 direct packet access,不再需要把包头读取到 bpf 函数栈,节省了大概 200 条指令(所以 bpf verifier 更高兴了,以后如果要扩展实现也更不容易撞上 verifier),性能也有了微小的提升。

Implementation FAQ

1. 为什么不保留之前的 iph, ipv6h, icmph, tcph, udph,而是使用 l3hdr, l4hdr ?

因为 clang 编译出的字节码很难通过 bpf verifier.

考虑下面的代码:

SEC("tc/ingress")
int tc_ingress(struct __sk_buff *skb)
{
    struct iphdr *ip;
    struct ipv6hdr *ip6;
[...]
    // tag1
    if (eth->h_proto == bpf_htons(ETH_P_IP)) {
        ip = (struct iphdr *)(data + offset);
        [...]
    } else if (ethh->h_proto == bpf_htons(ETH_P_IPV6)) {
        ip6 = (struct ipv6hdr *)(data + offset);
        [...]
    }
[...]
    // tag2
    if (eth->h_proto == bpf_htons(ETH_P_IP)) {
        x = ip->daddr;
    } else if (eth->h_proto == bpf_htons(ETH_P_IPV6)) {
        __builtin_memcpy(&x, &ip6->daddr, 16);
    }
}

clang 会编译出一条分支是从 tag1 ipv4 goto tag2,此时由于 tag1 ipv6 分支不执行,ip6 指针未初始化,clang 在这条分支之后的 ip6 都做了常量优化。但是 bpf verifier 会无脑遍历分支,在检查 tag1 ipv4 + tag2 ipv6 分支的时候,由于 ip6 指针被 clang 优化了,verifier 会报错。

这类问题有几种办法规避,我发现用统一的 l3hdr, l4hdr 抽象头是比较简单的做法。

2. 为什么在 prep_redirect_to_control_plane() 把 bpf_skb_store_bytes() 的调用移动到了最后?

因为 bpf_skb_store_bytes 和 bpf_skb_change_head 可能会改变 skb->data,导致之前解析到的 l3hdr, l4hdr 指针指向错误的位置。移动到最后,就算改变了 skb->data 也不影响。

3. benchmark?

我的简单测试是使用 direct packet access 比 bpf_skb_load_bytes 快一倍: https://github.com/jschwinger233/skb_access_bench , 在我们一些不涉及 route 的 hook 上有明显提升,如 lan_egress 大概有一倍的提升(跑 999999 次的时间从 25.010155ms 下降到 13.489028ms),但是在 wan_egress / lan_ingress 上提升很小,因为他们的瓶颈在 route(),我写完 route() 的 bpf 单测之后再去痛下狠手。

Checklist

Full Changelogs

Issue Reference

Closes #[issue number]

Test Result

douglarek commented 5 days ago

On my Manjaro Linux (kernel 6.9.6) machine, binding to LAN and WAN, the proxy and direct connection functions have passed the test. ✅

dae version unstable-20240624.r711.24fe0db
go runtime go1.22.4 linux/amd64
Copyright (c) 2022-2024 @daeuniverse
License GNU AGPLv3 <https://github.com/daeuniverse/dae/blob/main/LICENSE>
umlka commented 4 days ago

在debian 12上面测试失败,下面是报错代码

debian@debian-12:~$ journalctl -u dae -o cat -f
Starting dae.service - dae Service...
level=info msg="Include config files: [/usr/local/etc/dae/config.dae]"
level=info msg="Loading eBPF programs and maps into the kernel..."
level=info msg="The loading process takes about 120MB free memory, which will be released after loading. Insufficient memory will cause loading failure."
level=fatal msg="load eBPF objects: field TproxyWanEgress: program tproxy_wan_egress: load program: argument list too long: BPF program is too large. Processed 1000001 insn (4113 line(s) omitted)"
dae.service: Main process exited, code=exited, status=1/FAILURE
dae.service: Failed with result 'exit-code'.
Failed to start dae.service - dae Service.
dae.service: Consumed 17.838s CPU time.

系统

debian@debian-12:~$ uname -a
Linux debian-12 6.1.0-21-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.90-1 (2024-05-03) x86_64 GNU/Linux
MarksonHon commented 4 days ago

@umlka Does Debian have Kernel 6.6 in Backport repository?If so please test it .

umlka commented 4 days ago

@umlka Does Debian have Kernel 6.6 in Backport repository?If so please test it .

6.6内核是正常的

debian@debian-12:~$ uname -a
Linux debian-12 6.6.13+bpo-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.6.13-1~bpo12+1 (2024-02-15) x86_64 GNU/Linux
debian@debian-12:~$ dae -v
dae version unstable-20240618.r710.3fd282
go runtime go1.22.4 linux/amd64
Copyright (c) 2022-2024 @daeuniverse
License GNU AGPLv3 <https://github.com/daeuniverse/dae/blob/main/LICENSE>
debian@debian-12:~$ sudo systemctl status dae
● dae.service - dae Service
     Loaded: loaded (/etc/systemd/system/dae.service; enabled; preset: enabled)
     Active: active (running) since Mon 2024-06-24 20:52:34 CST; 1min 29s ago
       Docs: https://github.com/daeuniverse/dae
    Process: 1061 ExecStartPre=/usr/local/bin/dae validate -c /usr/local/etc/dae/config.dae (code=exited, status=0/SUCCESS)
   Main PID: 1075 (dae)
      Tasks: 13 (limit: 2278)
     Memory: 215.7M
        CPU: 9.179s
     CGroup: /system.slice/dae.service
             └─1075 /usr/local/bin/dae run --disable-timestamp -c /usr/local/etc/dae/config.dae
jschwinger233 commented 4 days ago

居然不是 6.1.0 内核的锅,而是 clang 14/15 的差异:我本地和 kernel-test ci 都用的 clang 14 而 pr build 用的 clang 15。

具体原因明天再看,高版本 clang 优化得更好了反而导致 verifier 过不去也不是新鲜事了 - -

mzz2017 commented 4 days ago

@jschwinger233 我之前是 clang14 不过 15 过……

dae-prow[bot] commented 1 day ago

❌ Your branch is currently out-of-sync to main. No worry, I will fix it for you.