axboe / liburing

Library providing helpers for the Linux kernel io_uring support
MIT License
2.86k stars 402 forks source link

IORING_FEAT_FAST_POLL : Same behavior for read()/write() as send()/recv() on sockets #274

Closed keithpl closed 3 weeks ago

keithpl commented 3 years ago

I'm currently running Linux 5.9.14, sorry if this has been addressed in >= 5.10.

With IORING_FEAT_FAST_POLL, recv() events can be submitted on non-blocking sockets and will not show up in the completion queue until there is a result that's not EAGAIN. However, submitting read() events against non-blocking sockets, without pending data to be read, immediately places the event in the completion queue with cqe->res as -EAGAIN. Would it be possible for read()/write() events to behave exactly the same as send()/recv() for sockets?

goldsteinn commented 3 years ago

Not a maintainer so take this with a grain of salt.

From a quick glance at the source it looks like the only affect of nonblocking read is that it will return -EAGAIN immediately if the data is not ready and that most control flows will still perform nonblocking operations.

(Standard call): io_uring_event -> io_submit_sqes: https://github.com/torvalds/linux/blob/master/fs/io_uring.c#L9214 (SQPOLL mode): io_sqe_thread -> io_submit_sqes: https://github.com/torvalds/linux/blob/master/fs/io_uring.c#L6914

io_submit_sqes -> io_submit_sqe: https://github.com/torvalds/linux/blob/master/fs/io_uring.c#L6875

io_submit_sqe -> io_queue_sqe: https://github.com/torvalds/linux/blob/master/fs/io_uring.c#L6627 / https://github.com/torvalds/linux/blob/master/fs/io_uring.c#L6612

io_queue_sqe -> io_issue_sqe (with force_nonblock = True): https://github.com/torvalds/linux/blob/master/fs/io_uring.c#L6488

io_issue_sqe -> io_read(..., force_nonblock, ...): https://github.com/torvalds/linux/blob/master/fs/io_uring.c#L6189

io_read will do nonblocking operation: https://github.com/torvalds/linux/blob/master/fs/io_uring.c#L3463

Only thing you are getting with O_NONBLOCK when submitting read events is return of -EAGAIN when data is not ready: https://github.com/torvalds/linux/blob/master/fs/io_uring.c#L3487

The only call to io_issue_sqe with force_nonblock = False is from io_wq_submit_work: https://github.com/torvalds/linux/blob/master/fs/io_uring.c#L6326 which is stored a function ptr in a struct: https://github.com/torvalds/linux/blob/master/fs/io_uring.c#L7990. I don't see that function pointer ever invoked / used so not sure what control flow would lead to io_issue_sqe with force_nonblock = False.

Tested this (ish) by submitting blocking socket reads in sqe_ring and second sqe could be handled first.