Open jrudolph opened 2 years ago
I would not recommend using provided buffers with IOSQE_ASYNC
, as you have noticed they need to serialize with the ring mutex. This is generally not a concern, but it does certainly become one if you have a lot of io-wq activity due to marking the SQEs async. You'll be better off setting aside some threads in userspace, each with a ring, and using provided buffers with those.
In general, IOSQE_ASYNC
isn't very efficient and should be avoided for most use cases.
Thanks for the quick answer. I agree, there are good alternatives.
I'm trying out io_uring and am testing different ways of submitting requests. My test is a simple webserver-like application that accepts multiple sockets, and then alternatively reads and writes each socket. Everything is running on a single application thread within a single ring.
In general, using
IOSQE_ASYNC
does not seem to make too much sense for network reads because it often does strictly more work than using the default path. On the other hand, for a single threaded server, much CPU time will be spent inside of the kernel TCP stack, so usingIOSQE_ASYNC
could help by freeing the application thread for other work while the kernel threads do all the heavy lifting.Looking into the performance with Linux 5.19.11 I noticed that the flamegraph shows lots of time spent in allocating buffers from the provided buffers:
Zooming in on
io_read
:This is with max 128 concurrent reads. It seems in that scenario the amount of concurrent wqe_workers gets quite high (maybe even 1 per requests?), so if there's a mutex in the buffer selection path that cannot work well if many or all of the sockets are readable at the same time.
Is this contention expected and should be documented?