GoogleCloudPlatform / compute-virtual-ethernet-linux

Compute Engine Virtual Ethernet Linux driver.
Other
69 stars 31 forks source link

XDP TX queues getting stuck when the tx posted packet counters overflows beyond u32 max #59

Open ivpr opened 1 week ago

ivpr commented 1 week ago

We are observing an issue with 100% CPU usage without any packets being processed on some of the CPUs handling XDP tx queues

Our setup

Instance: GCE n2-standard-32 Configured queues: 4 rx, 4 tx (0-3 CPU cores are used for RX queues & XDP program and 4-7 CPU cores are used for handling XDP_TX work) Driver Version: 1.3.4 Kernel/OS Version: Linux 6.1.0-17-cloud-amd64 SMP PREEMPT_DYNAMIC Debian 6.1.69-1 (2023-12-30) x86_64 GNU/Linux

We are attaching a eBPF/XDP program in native mode which modifies the packets and mostly returns with XDP_TX action

Observation

Continuous 100% CPU usage is observed on CPUs 6,7 which process XDP_TX packets while there isn't much usage for CPU 4,5 which also process XDP_TX packets. ksoftirqd process for 6,7 is consuming the 100% CPU

On checking the CPU Flame Graph for these cores, we see that most of the time is spent in gve_xdp_poll and gve_clean_xdp_done. bad_cpu_6_perf_next_hop data perf-folded

On checking the ethtool counters, we see that tx_posted_desc counter is lower than tx_completed_desc counter for 6,7

# ethtool -S ens4 | grep '\[[4-7]\]' | grep "posted\|completed" | grep tx
     tx_posted_desc[4]: 1622967499
     tx_completed_desc[4]: 1622967499
     tx_posted_desc[5]: 2328007405
     tx_completed_desc[5]: 2328007405
     tx_posted_desc[6]: 154
     tx_completed_desc[6]: 4294967274
     tx_posted_desc[7]: 170
     tx_completed_desc[7]: 4294967292

And tx_completed_desc for queues 6,7 are very close to uint32 max(2^32 = 4294967296) which indicates that tx_posted_desc could have overflown and reset which explains the low value

According to gve_clean_xdp_done code logic, this will not go inside the for loop since clean_end after overflow would be lower than tx->done and result in repoll all the time. This matches with our observation of counters (tx_posted/tx_completed) not getting incremented even is the CPU flame graph shows that time is spent in gve_clean_xdp_done

Similar logic for non XDP tx(gve_clean_tx_done) has handled this scenario by executing for loop starting from 0 till to_do which could be the reason it's not seen in non XDP flows