We are observing an issue with 100% CPU usage without any packets being processed on some of the CPUs handling XDP tx queues
Our setup
Instance: GCE n2-standard-32
Configured queues: 4 rx, 4 tx (0-3 CPU cores are used for RX queues & XDP program and 4-7 CPU cores are used for handling XDP_TX work)
Driver Version: 1.3.4
Kernel/OS Version: Linux 6.1.0-17-cloud-amd64 SMP PREEMPT_DYNAMIC Debian 6.1.69-1 (2023-12-30) x86_64 GNU/Linux
We are attaching a eBPF/XDP program in native mode which modifies the packets and mostly returns with XDP_TX action
Observation
Continuous 100% CPU usage is observed on CPUs 6,7 which process XDP_TX packets while there isn't much usage for CPU 4,5 which also process XDP_TX packets. ksoftirqd process for 6,7 is consuming the 100% CPU
On checking the CPU Flame Graph for these cores, we see that most of the time is spent in gve_xdp_poll and gve_clean_xdp_done.
On checking the ethtool counters, we see that tx_posted_desc counter is lower than tx_completed_desc counter for 6,7
And tx_completed_desc for queues 6,7 are very close to uint32 max(2^32 = 4294967296) which indicates that tx_posted_desc could have overflown and reset which explains the low value
According to gve_clean_xdp_donecode logic, this will not go inside the for loop since clean_end after overflow would be lower than tx->done and result in repoll all the time. This matches with our observation of counters (tx_posted/tx_completed) not getting incremented even is the CPU flame graph shows that time is spent in gve_clean_xdp_done
Similar logic for non XDP tx(gve_clean_tx_done) has handled this scenario by executing for loop starting from 0 till to_do which could be the reason it's not seen in non XDP flows
We are observing an issue with 100% CPU usage without any packets being processed on some of the CPUs handling XDP tx queues
Our setup
Instance: GCE n2-standard-32 Configured queues: 4 rx, 4 tx (0-3 CPU cores are used for RX queues & XDP program and 4-7 CPU cores are used for handling XDP_TX work) Driver Version: 1.3.4 Kernel/OS Version: Linux 6.1.0-17-cloud-amd64 SMP PREEMPT_DYNAMIC Debian 6.1.69-1 (2023-12-30) x86_64 GNU/Linux
We are attaching a eBPF/XDP program in native mode which modifies the packets and mostly returns with XDP_TX action
Observation
Continuous 100% CPU usage is observed on CPUs 6,7 which process XDP_TX packets while there isn't much usage for CPU 4,5 which also process XDP_TX packets. ksoftirqd process for 6,7 is consuming the 100% CPU
On checking the CPU Flame Graph for these cores, we see that most of the time is spent in gve_xdp_poll and gve_clean_xdp_done.![bad_cpu_6_perf_next_hop data perf-folded](https://github.com/GoogleCloudPlatform/compute-virtual-ethernet-linux/assets/14174960/88acb6be-c5c5-4d9f-aa77-8edf1837e488)
On checking the ethtool counters, we see that tx_posted_desc counter is lower than tx_completed_desc counter for 6,7
And tx_completed_desc for queues 6,7 are very close to uint32 max(2^32 = 4294967296) which indicates that tx_posted_desc could have overflown and reset which explains the low value
According to
gve_clean_xdp_done
code logic, this will not go inside thefor
loop sinceclean_end
after overflow would be lower thantx->done
and result in repoll all the time. This matches with our observation of counters (tx_posted/tx_completed) not getting incremented even is the CPU flame graph shows that time is spent in gve_clean_xdp_doneSimilar logic for non XDP tx(
gve_clean_tx_done
) has handled this scenario by executingfor
loop starting from0
tillto_do
which could be the reason it's not seen in non XDP flows