Open msune opened 2 days ago
(*) There seems to be a bug in the kernel; disabling all GSO/TSO offloads keeps marking egress SKBs as TCP_GSO
This should/will be investigated elsewhere, as it's not strictly related to sfunnel/ebpf.
Thge main issue is that there is no direct access to skb->gso_type
. Some strategies I tried so far:
bpf_skb_adjust_room()
with encap flagsNone of the flags listed in the doc works for the purpose, as:
SKB_GSO_UDP
/ GSO_UDP_L4
bpf_skb_change_tail()
bpf_skb_change_tail()
doc mentions:
This helper is a slow path utility intended for replies with control messages. And because it is targeted for slow path, the helper itself can afford to be slow: it implicitly linearizes, unclones and drops offloads from the skb.
The skb gso_type
is correctly reset to 0x0
, but then the skb is >> MTU. The (big) packet is later on dropped due to the MTU check (ofc, because the pkt is NOT anymore a GSOed pkt), in __dev_forward_skb2()
which ends up calling __is_skb_forwardable()
, here is the check
(See pkt size 2842
>> iface mtu 1440
)
0xffff902880e2a000 1 ksoftirqd/1:21 4026532600 0 487 0x0800 1440 2842 10.0.0.1:60148->10.0.1.2:8080(tcp) __dev_forward_skb
0xffff902880e2a000 1 ksoftirqd/1:21 4026532600 0 487 0x0800 1440 2842 10.0.0.1:60148->10.0.1.2:8080(tcp) __dev_forward_skb2
0xffff902880e2a000 1 ksoftirqd/1:21 4026532600 0 487 0x0800 1440 2842 10.0.0.1:60148->10.0.1.2:8080(tcp) kfree_skb_reason(SKB_DROP_REASON_NOT_SPECIFIED)
A simple repro of this issue - without encap/decaps - here msune/ebpf_gso:change_tail_gso.
I think this is a bug, and bpf_skb_change_tail()
should break the beefy packet into the segments before being sent (and should be done after all TC BPF hooks, I guess). This is probably worth having a discussion in cilium #ebpf slack channel.
Summary
There is a severe performance degradation when TCP is funneled over UDP on flows within the same host.
I was able to repro here: msune/ebpf_gso:main, using this synthetic scenario.
Env:
Linux XXX 6.1.0-22-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.94-1 (2024-06-21) x86_64 GNU/Linux
Debian clang version 14.0.6
Root cause analysis
The repro pushes a UDP header in
ns1
and pops it onns2
.pwru
clearly shows that theskb
is marked asSKB_GSO_TCPV4
(0x1
) (*). When UDP header is pushed:When the kernel attempts to UFO the packet:
Full pwru trace
``` 0xffff9027df8a4800 6 ~bin/iperf:69662 4026532606 0 0 0x0000 1440 6992 10.0.0.1:42592->10.0.1.2:8080(tcp) ip_local_out 0xffff9027df8a4800 6 ~bin/iperf:69662 4026532606 0 0 0x0000 1440 6992 10.0.0.1:42592->10.0.1.2:8080(tcp) __ip_local_out 0xffff9027df8a4800 6 ~bin/iperf:69662 4026532606 0 0 0x0800 1440 6992 10.0.0.1:42592->10.0.1.2:8080(tcp) ip_output 0xffff9027df8a4800 6 ~bin/iperf:69662 4026532606 0 107 0x0800 1440 6992 10.0.0.1:42592->10.0.1.2:8080(tcp) nf_hook_slow 0xffff9027df8a4800 6 ~bin/iperf:69662 4026532606 0 107 0x0800 1440 6992 10.0.0.1:42592->10.0.1.2:8080(tcp) apparmor_ip_postroute 0xffff9027df8a4800 6 ~bin/iperf:69662 4026532606 0 107 0x0800 1440 6992 10.0.0.1:42592->10.0.1.2:8080(tcp) ip_finish_output 0xffff9027df8a4800 6 ~bin/iperf:69662 4026532606 0 107 0x0800 1440 6992 10.0.0.1:42592->10.0.1.2:8080(tcp) __ip_finish_output 0xffff9027df8a4800 6 ~bin/iperf:69662 4026532606 0 107 0x0800 1440 6992 10.0.0.1:42592->10.0.1.2:8080(tcp) ip_finish_output2 0xffff9027df8a4800 6 ~bin/iperf:69662 4026532606 0 107 0x0800 1440 6992 10.0.0.1:42592->10.0.1.2:8080(tcp) neigh_resolve_output 0xffff9027df8a4800 6 ~bin/iperf:69662 4026532606 0 107 0x0800 1440 6992 10.0.0.1:42592->10.0.1.2:8080(tcp) eth_header 0xffff9027df8a4800 6 ~bin/iperf:69662 4026532606 0 107 0x0800 1440 6992 10.0.0.1:42592->10.0.1.2:8080(tcp) skb_push 0xffff9027df8a4800 6 ~bin/iperf:69662 4026532606 0 107 0x0800 1440 7006 10.0.0.1:42592->10.0.1.2:8080(tcp) __dev_queue_xmit 0xffff9027df8a4800 6 ~bin/iperf:69662 4026532606 0 107 0x0800 1440 7006 10.0.0.1:42592->10.0.1.2:8080(tcp) tcf_classify 0xffff9027df8a4800 6 ~bin/iperf:69662 4026532606 0 107 0x0800 1440 7006 10.0.0.1:42592->10.0.1.2:8080(tcp) skb_ensure_writable 0xffff9027df8a4800 6 ~bin/iperf:69662 4026532606 0 107 0x0800 1440 7006 10.0.0.1:42592->10.0.1.2:8080(udp) skb_ensure_writable 0xffff9027df8a4800 6 ~bin/iperf:69662 4026532606 0 107 0x0800 1440 7006 10.0.0.1:42592->10.0.1.2:8080(udp) bpf_skb_generic_push 0xffff9027df8a4800 6 ~bin/iperf:69662 4026532606 0 107 0x0800 1440 7006 10.0.0.1:42592->10.0.1.2:8080(udp) skb_push 0xffff9027df8a4800 6 ~bin/iperf:69662 4026532606 0 107 0x0800 1440 7014 10.0.0.1:42592->10.0.1.2:80(udp) netdev_core_pick_tx 0xffff9027df8a4800 6 ~bin/iperf:69662 4026532606 0 107 0x0800 1440 7014 10.0.0.1:42592->10.0.1.2:80(udp) validate_xmit_skb 0xffff9027df8a4800 6 ~bin/iperf:69662 4026532606 0 107 0x0800 1440 7014 10.0.0.1:42592->10.0.1.2:80(udp) netif_skb_features 0xffff9027df8a4800 6 ~bin/iperf:69662 4026532606 0 107 0x0800 1440 7014 10.0.0.1:42592->10.0.1.2:80(udp) passthru_features_check 0xffff9027df8a4800 6 ~bin/iperf:69662 4026532606 0 107 0x0800 1440 7014 10.0.0.1:42592->10.0.1.2:80(udp) skb_network_protocol 0xffff9027df8a4800 6 ~bin/iperf:69662 4026532606 0 107 0x0800 1440 7014 10.0.0.1:42592->10.0.1.2:80(udp) __skb_gso_segment 0xffff9027df8a4800 6 ~bin/iperf:69662 4026532606 0 107 0x0800 1440 7014 10.0.0.1:42592->10.0.1.2:80(udp) skb_mac_gso_segment 0xffff9027df8a4800 6 ~bin/iperf:69662 4026532606 0 107 0x0800 1440 7014 10.0.0.1:42592->10.0.1.2:80(udp) skb_network_protocol 0xffff9027df8a4800 6 ~bin/iperf:69662 4026532606 0 107 0x0800 1440 7000 10.0.0.1:42592->10.0.1.2:80(udp) inet_gso_segment 0xffff9027df8a4800 6 ~bin/iperf:69662 4026532606 0 107 0x0800 1440 6980 10.0.0.1:42592->10.0.1.2:80(udp) udp4_ufo_fragment 0xffff9027df8a4800 6 ~bin/iperf:69662 4026532606 0 107 0x0800 1440 7014 10.0.0.1:42592->10.0.1.2:80(udp) kfree_skb_reason(SKB_DROP_REASON_NOT_SPECIFIED) ```It drops it) as it can't find
SKB_GSO_UDP
/GSO_UDP_L4
in here https://github.com/torvalds/linux/blob/de2f378f2b771b39594c04695feee86476743a69/net/ipv4/udp_offload.c#L429.(*) There seems to be a bug in the kernel; disabling all GSO/TSO offloads keeps marking egress SKBs as
TCP_GSO