axboe / liburing

Library providing helpers for the Linux kernel io_uring support
MIT License
2.86k stars 402 forks source link

[QUESTION] URing weird CPU utilization with write requests #1260

Open HazyMrf opened 2 weeks ago

HazyMrf commented 2 weeks ago

I've been measuring performance of my Uring across the application and found out unexpected results:

1) io_uring affinity set to 4 cores. application only uses uring for writes. the load is distributed not evenly across the cores:

tg_image_3237405771 (2)

2) For comparison another application uses uring only for reads. The load is distrubted evenly

tg_image_2428212738 (1)

Can you please explain the reasons of such weird behaviour?

HazyMrf commented 2 weeks ago

Application does not use neither IORING_SETUP_IOPOLL nor IORING_SETUP_SQPOLL. For each write operation it sets IOSQE_ASYNC flag. Submits are done using io_uring_submit(). Here is an strace sample of the application:

$ sudo perf trace --tid 1321722 -e 'io_uring_enter'  -- sleep 1
     0.000 ( 0.010 ms): io_uring_enter(fd: 4, to_submit: 1, argsz: 8)                         = 1
     0.182 ( 0.003 ms): io_uring_enter(fd: 4, to_submit: 1, argsz: 8)                         = 1
     0.362 ( 0.006 ms): io_uring_enter(fd: 4, to_submit: 1, argsz: 8)                         = 1
     1.328 ( 0.005 ms): io_uring_enter(fd: 4, to_submit: 2, argsz: 8)                         = 2
     3.264 ( 0.016 ms): io_uring_enter(fd: 4, to_submit: 1, argsz: 8)                         = 1
     7.107 ( 0.004 ms): io_uring_enter(fd: 4, to_submit: 1, argsz: 8)                         = 1
axboe commented 2 weeks ago

Don't set IOSQE_ASYNC, it'll generally just slow things down. On anything more recent (eg 6.x kernels), it'll just do more harm than good. Anything that needs to punt to a worker thread will do so internally anyway, forcing it is usually not a good idea.

HazyMrf commented 2 weeks ago

Okay, I will try this and send measured results here, thanks. Maybe there are other general advises to make my URing faster on new kernels? What I am for is low latency and balanced load across many CPUs

HazyMrf commented 2 weeks ago

Hello @axboe , I tested your idea with removing IOSQE_ASYNC and it sadly didn't work, that is not the reason of performance degradation. Unfortunately I still observe uneven load distribution for write requests after removing IOSQE_ASYNC flag photo_2024-10-09 11 50 11