Closed lano1106 closed 1 month ago
I did some progress... It appears that in regards to busy polling, the driver works as expected. I just had to choose better values
busy_poll_period = 50 with
echo 1000 > /sys/class/net/enp39s0/napi_defer_hard_irqs
brings the NIC interrupt count to a halt... but I have the feeling to play at wack-a-mole... now this busy-polling appears to have awaken new kernel process which forces the kernel to generate again local timer interrupts.
67: 242886291 0 0 0 PCI-MSIX-0000:27:00.0 1-edge enp39s0-Tx-Rx-0
68: 1 217801693 0 0 PCI-MSIX-0000:27:00.0 2-edge enp39s0-Tx-Rx-1
69: 1 0 216506015 0 PCI-MSIX-0000:27:00.0 3-edge enp39s0-Tx-Rx-2
70: 1 0 0 250752987 PCI-MSIX-0000:27:00.0 4-edge enp39s0-Tx-Rx-3
NMI: 0 0 0 0 Non-maskable interrupts
LOC: 35041049 17865096 18656563 25358830 Local timer interrupts
thx if you look into what I see with your knowledge to perhaps help me to find a fix. OTOH, it seems like the problem might come from io_uring... I have opened an issue on their github page: https://github.com/axboe/liburing/issues/1190
in its most simplified form...
with: ethtool -L enp39s0 combined 4
ring 2 sockets managed by the thread running on CPU3 works perfectly
with ethtool -L enp39s0 combined 1
ring 2 sockets managed by the thread running on CPU3 are not serviced correctly.
What remains to determine is if the driver has something to do with this situation or if the problem exclusively comes from io_uring
fyi, I have found my issue. your driver is perfectly working.
if you perform network operations on an isolated nohz_full processor, the networking softirqs for that processor will never be invoked.
Preliminary Actions
Driver Type
Linux kernel driver for Elastic Network Adapter (ENA)
Driver Tag/Commit
kernel 6.9.10-1-ec2 (recompiled by myself from kernel.org git)
Custom Code
No
OS Platform and Distribution
Linux 6.9.10-1-ec2 ArchLinux
Support request
My instance type is a c7i.2xlarge. Hyperthreading disabled so 4 CPU
kernel cmdline: ipv6.disable=1 hugepages=72 isolcpus=1,2,3 nohz_full=1,2,3 rcu_nocbs=1,2,3 rcu_nocb_poll irqaffinity=0 idle=nomwait processor.max_cstate=1 intel_idle.max_cstate=1 nmi_watchdog=0
my application is using io_uring with NAPI busy polling and I assign CPUs to threads having the highest latency requirements possible.
CPU 1 is isolated and a single thread is assigned to it. This thread has an io_uring configured with struct io_uring_napi napiSetting{200, 1}; io_uring_register_napi(ring, &napiSetting); CPU1 thread manage about 20 TCP connections
created sockets also have the options SO_BUSY_POLL and SO_PREFER_BUSY_POLL set but I think that io_uring does not use them.
A SQPOLL kernel thread is created and its CPU affinity is set to 2
CPU3 task manage 2 TCP sockets
CPU3 has the second low latency thread created. It also have an io_uring with NAPI busy poll enabled. Its ring is attached to CPU1 ring SQPOLL worker:
its io_uring is setup for NAPI busy polling in the same way than CPU1 thread does. Its TCP socket options have the same treatment as well:
CPU0 is reserved for all the other processes of the system. Less than 1%.
Other settings:
(the napi_defer_hard_irqs and gro_flush_timeout parameters are following recommendation found in Documentation/networking/napi.rst)
first problem: NAPI busy poll is done by the io_uring sqpoll worker thread which runs at close to 100%. Despite doing the busy polling, the ENA driver is generating plenty of interrupts:
67: 235620161 0 0 0 PCI-MSIX-0000:27:00.0 1-edge enp39s0-Tx-Rx-0 68: 1 211480332 0 0 PCI-MSIX-0000:27:00.0 2-edge enp39s0-Tx-Rx-1 69: 1 0 207685214 0 PCI-MSIX-0000:27:00.0 3-edge enp39s0-Tx-Rx-2 70: 1 0 0 243540978 PCI-MSIX-0000:27:00.0 4-edge enp39s0-Tx-Rx-3
I am doing NAPI busy polling precisely to avoid my low-latency threads to be interrupted from what they are doing... Each spurious interrupt is inducing a 20-50 uSec delay to my threads...
this seems like a driver bug that the driver still issue interrupts with the settings and the current usage.
problem 2: I have tried to workaround the issue with ethtool -L enp39s0 combined 1
the only remaining queue is the 1 binded to CPU0... but I have this error message in the kernel log:
Jul 26 04:11:44 ip-172-31-39-89 kernel: ena 0000:27:00.0 enp39s0: Command parameter 46 is not supported
When I restart my application, all is good except that the sockets created on CPU3 ends up missing incoming data.
I see the server HTTP reply: [2024-07-26 03:48:45] INFO WSCTX/log_emit_function 7194: Server reply: HTTP/1.1 101 Switching Protocols Upgrade: websocket Connection: Upgrade Sec-WebSocket-Accept: QpYBhzQ2+wnF5mbBu+NkBHnB0Xk= Date: Fri, 26 Jul 2024 03:48:44 GMT
but the connection eventually times out.
I have tried
same thing...
the only way that CPU3 thread can have its TCP sockets serviced adequately is with:
ethtool -L enp39s0 combined 4
If I would have be able to make a 1 queue setup that sends all its interrupt to CPU0, it would have been ok for me...
In conclusion:
Contact Details
olivier@trillion01.com