Jamesits / linux-gre-keepalive

High-performance passive (a.k.a. reply-only) GRE keepalive support for Linux, written in eBPF/XDP.
GNU General Public License v2.0
54 stars 8 forks source link

Why is the XDP_TX packet dropped by the kernel? #5

Open someonebw opened 1 year ago

someonebw commented 1 year ago

Jamesits 你好,我又来了:)

测试环境: debian11

iproute2/oldstable,now 5.10.0-4 amd64 [installed,automatic]

root@debian11:~# cat /boot/config-5.10.0-25-cloud-amd64 |grep BTF CONFIG_DEBUG_INFO_BTF=y

root@debian11:~# ip -V ip utility, iproute2-5.9.0, libbpf 0.3.0

gre tunnel config: local ip:172.19.92.248 remote ip:172.19.100.254

1.//bpftool prog traclog的结果(libbpf测试过0.0.8的版本,1.2.0的版本,情况一样) 处理in的数据包 ,然后截取head后,xdp_tx 。

      <idle>-0       [001] d.s. 69839.759732: bpf_trace_printk: New packet

      <idle>-0       [001] dNs. 69839.759767: bpf_trace_printk: Packet header dump:

      <idle>-0       [001] dNs. 69839.759770: bpf_trace_printk: #0: 45

      <idle>-0       [001] dNs. 69839.759771: bpf_trace_printk: #1: c0

      <idle>-0       [001] dNs. 69839.759772: bpf_trace_printk: #2: 0

      <idle>-0       [001] dNs. 69839.759773: bpf_trace_printk: #3: 30

      <idle>-0       [001] dNs. 69839.759774: bpf_trace_printk: #4: a

      <idle>-0       [001] dNs. 69839.759775: bpf_trace_printk: #5: 1f

      <idle>-0       [001] dNs. 69839.759775: bpf_trace_printk: #6: 0

      <idle>-0       [001] dNs. 69839.759776: bpf_trace_printk: #7: 0

      <idle>-0       [001] dNs. 69839.759777: bpf_trace_printk: #8: ff

      <idle>-0       [001] dNs. 69839.759778: bpf_trace_printk: #9: 2f

      <idle>-0       [001] dNs. 69839.759779: bpf_trace_printk: #10: 96

      <idle>-0       [001] dNs. 69839.759780: bpf_trace_printk: #11: a2

      <idle>-0       [001] dNs. 69839.759781: bpf_trace_printk: #12: ac

      <idle>-0       [001] dNs. 69839.759782: bpf_trace_printk: #13: 13

      <idle>-0       [001] dNs. 69839.759782: bpf_trace_printk: #14: 64

      <idle>-0       [001] dNs. 69839.759783: bpf_trace_printk: #15: fe

      <idle>-0       [001] dNs. 69839.759784: bpf_trace_printk: #16: ac

      <idle>-0       [001] dNs. 69839.759785: bpf_trace_printk: #17: 13

      <idle>-0       [001] dNs. 69839.759786: bpf_trace_printk: #18: 5c

      <idle>-0       [001] dNs. 69839.759786: bpf_trace_printk: #19: f8

      <idle>-0       [001] dNs. 69839.759787: bpf_trace_printk: #20: 0

      <idle>-0       [001] dNs. 69839.759788: bpf_trace_printk: #21: 0

      <idle>-0       [001] dNs. 69839.759789: bpf_trace_printk: #22: 8

      <idle>-0       [001] dNs. 69839.759789: bpf_trace_printk: #23: 0

      <idle>-0       [001] dNs. 69839.759790: bpf_trace_printk: #24: 45

      <idle>-0       [001] dNs. 69839.759791: bpf_trace_printk: #25: c0

      <idle>-0       [001] dNs. 69839.759792: bpf_trace_printk: #26: 0

      <idle>-0       [001] dNs. 69839.759793: bpf_trace_printk: #27: 18

      <idle>-0       [001] dNs. 69839.759793: bpf_trace_printk: #28: a

      <idle>-0       [001] dNs. 69839.759794: bpf_trace_printk: #29: 1e

      <idle>-0       [001] dNs. 69839.759795: bpf_trace_printk: #30: 0

      <idle>-0       [001] dNs. 69839.759796: bpf_trace_printk: #31: 0

      <idle>-0       [001] dNs. 69839.759797: bpf_trace_printk: Outer GRE flags=0x0 proto=8

      <idle>-0       [001] dNs. 69839.759799: bpf_trace_printk: IPv4 packet_size=0x14, proto=0x2f

      <idle>-0       [001] dNs. 69839.759800: bpf_trace_printk: Inner is GRE4, proto=0

      <idle>-0       [001] dNs. 69839.759801: bpf_trace_printk: GRE4 keepalive received!

2. 但是。

xdp_tx的包,有问题,导致gre2卡的dropped队列增加。

gre2: flags=209<UP,POINTOPOINT,RUNNING,NOARP> mtu 1476 inet 6.6.6.2 netmask 255.255.255.0 destination 6.6.6.2 inet6 fe80::5efe:ac13:5cf8 prefixlen 64 scopeid 0x20 unspec AC-13-5C-F8-00-00-00-55-00-00-00-00-00-00-00-00 txqueuelen 1000 (UNSPEC) RX packets 13839 bytes 332136 (324.3 KiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 36 bytes 2240 (2.1 KiB) TX errors 0 dropped 9457 overruns 0 carrier 0 collisions 0

3.通过nettrace 对核心丢包,进行分析,如下

//[69535.293726] [enqueue_to_backlog ] ether protocol: 44051

** ffff88fec0aeb900 [69535.293595] [__netif_receive_skb_core] GRE: 172.19.100.254 -> 172.19.92.248 [69535.293618] [ip_rcv_core ] GRE: 172.19.100.254 -> 172.19.92.248 [69535.293631] [ip_route_input_slow ] GRE: 172.19.100.254 -> 172.19.92.248 [69535.293661] [fib_validate_source ] GRE: 172.19.100.254 -> 172.19.92.248 [69535.293674] [ip_local_deliver ] GRE: 172.19.100.254 -> 172.19.92.248 [69535.293684] [ip_local_deliver_finish] GRE: 172.19.100.254 -> 172.19.92.248 [69535.293726] [enqueue_to_backlog ] ether protocol: 44051 [69535.293740] [__netif_receive_skb_core] ether protocol: 44051 [69535.293754] [netif_receive_generic_xdp] ether protocol: 44051 [69535.293904] [kfree_skb ] ether protocol: 44051

** ffff88fec0aeb900

4. xdpdump 抓包的数据如下:

XDP_TX的是 异常数据?

这里type是ac 13(十进制的44051)

0000 45 c0 00 30 aa f4 00 00 ff 2f f5 cc ac 13 64 fe 0010 ac 13 5c f8 00 00 08 00 45 c0 00 18 aa f3 00 00 0020 ff 2f f5 e5 ac 13 5c f8 ac 13 64 fe 00 00 00 00

5.问题:为什么 if (bpf_xdp_adjust_head(ctx, (int)(cutoff_pos - data_start))) return -1; action = XDP_TX; 拆解head后的包,是异常包。 导致kernel处理的时候,作为异常包给直接drop了。

Jamesits commented 1 year ago

感谢耐心测试!这个问题很奇怪,让我研究一下~

someonebw commented 1 year ago

感激你的答复,[Jamesits] 其实之前研究这个问题,也是进行到中途过程中,出现了一些波折,导致跟不下去了,而中断了研究。

之前那次,也是感觉xdp-tx发包了,但是因为当时,缺乏调试手段(dropwatch,xdpdump,nettrace这几个工具,都不会用) 这次,掌握这些调试技术后,才能够更好的,全面分析这个问题。

并且,还碰到一个比较奇怪的问题。(原因不明) 在某些版本的kernel的环境下。明明是一个正常的来自remote ip的keepalive包,但是 keepalive_gre.c程序解出来的结果,确是“Packet size too small, dump failed”。

someonebw commented 1 year ago

不能正常解包的kernel如下的版本,测试过的如下: "Packet size too small, dump failed" 6.4.11-1.el7.elrepo.x86_6 5.14.0-352.el9.x86_64

可以正常解包的kernel版本如下: 5.4.253-1.el7.elrepo.x86_6 5.10.0-25-cloud-amd64 #1 SMP Debian 5.10.191-1 Debian 4.19.249-2

Jamesits commented 1 year ago

感谢推荐调试工具,这几个工具我之前也不了解,得看一下文档~

Packet size too small 是因为包长度小于 DEBUG_PRINT_HEADER_SIZE,可以把 DEBUG_PRINT_HEADER_SIZE 改小一点看看能不能把内容打出来。或者在那加个动态判断也行(比较难写,因为 BPF 必须 unroll 循环)

someonebw commented 1 year ago

(dropwatch,xdpdump,nettrace这几个工具都很好用) 以下建议,供参考。 xdpdump抓包方便,dropwatch里面的dwdump也可以抓包(kernel支持btf的情况下) 下载地址: xdpdump(https://github.com/xdp-project/xdp-tools) dropwatch(https://github.com/nhorman/dropwatch) nettrace(https://github.com/OpenCloudOS/nettrace)(腾讯开源工具,做的蛮好的

最好用的是nettrace,可以直接打印出drop包的堆栈。(也最好需要核心支持btf),有安装包,可以直接装了,开箱即用。

someonebw commented 1 year ago

一个正常的。来自remote ip的keepalive包的长度如下:

包体长度如下

ethhd 14byte

如下包体长度(20+4+20+4=48byte)

outiphd 20 byte gre 4byte iniphd 20 byte gre 4byte

debian11 5.10.0-25-cloud-amd64版本的核心下。 解包正常,

DEBUG_PRINT_HEADER_SIZE这里,我最大可以配置成48都是正常的。

但是,5.14.0-352.el9.x86_64版本下。 同样长度的包体。 解包异常

DEBUG_PRINT_HEADER_SIZE这里,我最大只能配置成24。

配置成24后,不会提示"Packet size too small, dump failed\n"了。 但是解包异常的。 bpf_trace_printk: Outer GRE flags=0x0 pro to=0 bpf_trace_printk: Unknown proto 0 inside

图片

DEBUG_PRINT_HEADER_SIZE这里的限制,是kernel的哪里控制的呢?疑问!!!

someonebw commented 1 year ago

新的实验结果 感觉测试的这个,linux的kernel,已经实现了gre keepalive的回包逻辑,不需要我们通过附加xdp程序到gre接口上了。 不知道是ip_gre实现的,还是在哪里实现的。

测试环境如下: Debian11 debian11 5.10.0-25-cloud-amd64 ip utility, iproute2-5.9.0, libbpf 0.3.0 remote gre ip(HuaWei FW)

验证的结果:(必要的条件,必须要配置;是否需要加载xdp相关的程序:不需要)

net.ipv4.conf.gre2.accept_local = 1 net.ipv4.conf.gre2.forwarding = 1 net.ipv4.conf.gre2.rp_filter = 0

配置如下的时候

sysctl -w net.ipv4.conf.gre2.accept_local=1 sysctl -w net.ipv4.conf.gre2.forwarding=1 sysctl -w net.ipv4.conf.gre2.rp_filter = 0

有正常的gre keepalive 回包。

nettrace的trace结果如下:

[5490.409110] [napi_gro_receive_entry] GRE: 172.19.100.254 -> 172.19.92.248 [5490.409165] [dev_gro_receive ] GRE: 172.19.100.254 -> 172.19.92.248 [5490.409182] [netif_receive_skb_core] GRE: 172.19.100.254 -> 172.19.92.248 [5490.409194] [tpacket_rcv ] GRE: 172.19.100.254 -> 172.19.92.248 [5490.409209] [ip_rcv_core ] GRE: 172.19.100.254 -> 172.19.92.248 [5490.409224] [ip_route_input_slow ] GRE: 172.19.100.254 -> 172.19.92.248 [5490.409242] [fib_validate_source ] GRE: 172.19.100.254 -> 172.19.92.248 [5490.409256] [ip_local_deliver ] GRE: 172.19.100.254 -> 172.19.92.248 [5490.409267] [ip_local_deliver_finish] GRE: 172.19.100.254 -> 172.19.92.248 [5490.409288] [napi_gro_receive_entry] ether protocol: 44051 [5490.409298] [dev_gro_receive ] ether protocol: 44051 [5490.409309] [__netif_receive_skb_core] ether protocol: 44051 [5490.409325] [ip_rcv_core ] ether protocol: 44051 [5490.409334] [ip_route_input_slow ] ether protocol: 44051 [5490.409343] [fib_validate_source ] ether protocol: 44051 [5490.409358] [ip_forward ] ether protocol: 44051 [5490.409376] [ip_output ] ether protocol: 44051 [5490.409386] [nf_hook_slow ] ether protocol: 44051 ipv4 in chain: POST_ROUTING [5490.409403] [ip_finish_output ] ether protocol: 44051 [5490.409412] [ip_finish_output2 ] ether protocol: 44051 [5490.409422] [dev_queue_xmit ] ether protocol: 23155 [5490.409436] [sch_direct_xmit ] GRE: 172.19.92.248 -> 172.19.100.254 queue state: 0, flags: 174, last update: 212ms, len: 0 [5490.409450] [dev_hard_start_xmit ] GRE: 172.19.92.248 -> 172.19.100.254 skb is successfully sent to the NIC driver [5490.409461] [skb_clone ] GRE: 172.19.92.248 -> 172.19.100.254 [5490.409506] [tpacket_rcv ] GRE: 172.19.92.248 -> 172.19.100.254 [5490.409518] [consume_skb ] GRE: 172.19.92.248 -> 172.19.100.254 [5490.409676] [consume_skb ] GRE: 172.19.92.248 -> 172.19.100.254

tcpdump (结果)

07:38:45.083884 IP (tos 0xc0, ttl 255, id 37900, offset 0, flags [none], proto GRE (47), length 48) 172.19.100.254 > 172.19.92.248: GREv0, Flags [none], length 28 IP (tos 0xc0, ttl 255, id 37899, offset 0, flags [none], proto GRE (47), length 24) 172.19.92.248 > 172.19.100.254: GREv0, Flags [none], length 4 gre-proto-0x0 0x0000: 45c0 0030 940c 0000 ff2f 0cb5 ac13 64fe 0x0010: ac13 5cf8 0000 0800 45c0 0018 940b 0000 0x0020: ff2f 0cce ac13 5cf8 ac13 64fe 0000 0000 07:38:45.084174 IP (tos 0xc0, ttl 254, id 37899, offset 0, flags [none], proto GRE (47), length 24) 172.19.92.248 > 172.19.100.254: GREv0, Flags [none], length 4 gre-proto-0x0 0x0000: 45c0 0018 940b 0000 fe2f 0dce ac13 5cf8 0x0010: ac13 64fe 0000 0000

配置如下的时候

sysctl -w net.ipv4.conf.gre2.accept_local=0 sysctl -w net.ipv4.conf.gre2.forwarding=1 sysctl -w net.ipv4.conf.gre2.rp_filter=0

无正常的gre keepalive 回包。

nettrace的trace结果如下:

[492.939232] [napi_gro_receive_entry] GRE: 172.19.100.254 -> 172.19.92.248 [492.939272] [dev_gro_receive ] GRE: 172.19.100.254 -> 172.19.92.248 [492.939288] [__netif_receive_skb_core] GRE: 172.19.100.254 -> 172.19.92.248 [492.939299] [tpacket_rcv ] GRE: 172.19.100.254 -> 172.19.92.248 [492.939313] [ip_rcv_core ] GRE: 172.19.100.254 -> 172.19.92.248 [492.939325] [ip_route_input_slow ] GRE: 172.19.100.254 -> 172.19.92.248 [492.939338] [fib_validate_source ] GRE: 172.19.100.254 -> 172.19.92.248 [492.939350] [ip_local_deliver ] GRE: 172.19.100.254 -> 172.19.92.248 [492.939360] [ip_local_deliver_finish] GRE: 172.19.100.254 -> 172.19.92.248 [492.939456] [napi_gro_receive_entry] ether protocol: 44051 [492.939471] [dev_gro_receive ] ether protocol: 44051 [492.939482] [__netif_receive_skb_core] ether protocol: 44051 [492.939492] [ip_rcv_core ] ether protocol: 44051 [492.939502] [ip_route_input_slow ] ether protocol: 44051 [492.939513] [fib_validate_source ] ether protocol: 44051 [492.943448] [kfree_skb ] ether protocol: 44051

配置如下的时候

sysctl -w net.ipv4.conf.gre2.accept_local=1 sysctl -w net.ipv4.conf.gre2.forwarding=0 sysctl -w net.ipv4.conf.gre2.rp_filter=0

无正常的gre keepalive 回包。

nettrace的trace结果如下:

[573.021961] [napi_gro_receive_entry] GRE: 172.19.100.254 -> 172.19.92.248 [573.022001] [dev_gro_receive ] GRE: 172.19.100.254 -> 172.19.92.248 [573.022018] [__netif_receive_skb_core] GRE: 172.19.100.254 -> 172.19.92.248 [573.022029] [tpacket_rcv ] GRE: 172.19.100.254 -> 172.19.92.248 [573.022044] [ip_rcv_core ] GRE: 172.19.100.254 -> 172.19.92.248 [573.022057] [ip_route_input_slow ] GRE: 172.19.100.254 -> 172.19.92.248 [573.022073] [fib_validate_source ] GRE: 172.19.100.254 -> 172.19.92.248 [573.022086] [ip_local_deliver ] GRE: 172.19.100.254 -> 172.19.92.248 [573.022096] [ip_local_deliver_finish] GRE: 172.19.100.254 -> 172.19.92.248 [573.022142] [napi_gro_receive_entry] ether protocol: 44051 [573.022153] [dev_gro_receive ] ether protocol: 44051 [573.022164] [__netif_receive_skb_core] ether protocol: 44051 [573.022173] [ip_rcv_core ] ether protocol: 44051 [573.022182] [ip_route_input_slow ] ether protocol: 44051 [573.022196] [kfree_skb ] ether protocol: 44051

配置

net.ipv4.conf.gre2.accept_local = 1 net.ipv4.conf.gre2.forwarding = 1 net.ipv4.conf.gre2.rp_filter = 1 的时候,报如下错误。

无正常的gre keepalive 回包。

nettrace的trace结果如下

[285.557748] GRE: 172.19.92.247 -> 172.19.100.254, reason: IP_RPFILTER, ip_rcv_finish_core.constprop.0+0x1d7

someonebw commented 1 year ago

使用linux kernel 自带的 gre keepalive 回包(必要的条件,必须要配置;)

net.ipv4.conf.gre2.accept_local = 1 net.ipv4.conf.gre2.forwarding = 1 net.ipv4.conf.gre2.rp_filter = 0

测试通过的环境如下: 测试环境如下: Centos 9 Linux localhost 6.1.47-1.el9.elrepo.x86_64 ip utility, iproute2-6.2.0, libbpf 1.2.0 remote gre ip(HuaWei FW)

Centos 9 5.14.0-160.el9.x86_64 ip utility, iproute2-6.2.0, libbpf 1.2.0 remote gre ip(HuaWei FW)

Debian11 debian11 5.10.0-25-cloud-amd64 ip utility, iproute2-5.9.0, libbpf 0.3.0 remote gre ip(HuaWei FW)

Debian10 debian10 4.19.0-21-amd64 ip utility, iproute2-6.2.0, libbpf 1.2.0 remote gre ip(HuaWei FW)