I've been running the test-ovs-vxlan.sh script to help me understand how Mellanox TC based HW offloads work and I've found something a bit odd. The iperf test in the script uses 3 threads and when HW offloads are enabled the total throughput is usually at least double or more than when the offloads are disabled. However, while trying different numbers of threads I found that a single thread will produce higher throughput with HW offloads disabled. Once two or more threads are used then the HW offload throughput is higher than without the HW offloads.
Do you know why a single thread would be faster with HW offloads disabled?
I've been running the test-ovs-vxlan.sh script to help me understand how Mellanox TC based HW offloads work and I've found something a bit odd. The iperf test in the script uses 3 threads and when HW offloads are enabled the total throughput is usually at least double or more than when the offloads are disabled. However, while trying different numbers of threads I found that a single thread will produce higher throughput with HW offloads disabled. Once two or more threads are used then the HW offload throughput is higher than without the HW offloads.
Do you know why a single thread would be faster with HW offloads disabled?