io_uring questions for a tutorial

Recently I published a io_uring NVMe example and it proved to be quite popular: I got around 30 messages in my inbox from other engineers asking for more explanations and examples, which tells me that many more noob engineers like me want to use this beautiful interface :smile:

I will try to collect tips and best practices in a tutorial but I see there's a lot of stuff to investigate, hence some noob questions:

Can I use fixed buffers with send/recv? Maybe zero copy is the equivalent for send/recv, or maybe the two concepts are orthogonal?
Does io_uring do any batching behind the scene? Maybe with SQPOLL can I expect requests to be batched?
What do you think it's more important for performance, fixed buffers or batching?
It seems that fixed buffers for NVMe commands will be released with the 6.1 kernel, right?
Is there a way to guarantee that my ops will always use io_arm_poll_handler and hence not spawn worker threads? Like if I only do NVMe ops, or O_DIRECT reads/writes, or socket ops, will there be a guarantee that no worker thread is required?
If I understood correctly I should register files which I plan to access often for performance reasons, like /dev/ng0n1 for NVMe commands. Does this apply for sockets too? Is there a limit, or can I register as many files as I want?
Is there any plan to support AF_XDP sockets? or is there an equivalent in io_uring?
What are the plans for eBPF?

thank you very much for your patience :smile:

I will try to collect tips and best practices in a tutorial but I see there's a lot of stuff to investigate, hence some noob questions:
* Can I use fixed buffers with send/recv? Maybe zero copy is the equivalent for send/recv, or maybe the two concepts are orthogonal?

The concepts have enough in common, without zerocopy regisitered buffers usualy lose most of the performance benefits. O_DIRECT read/write are already zerocopy (if the backing file supports it), however networking is different and we can't fit zerocopy into the usual recv/send API.

Newly added IORING_OP_SEND_ZC and IORING_OP_SENDMSG_ZC do support registered buffers. There is no io_uring specific zc receive, but rx is a different topic.

https://lore.kernel.org/io-uring/cover.1657194434.git.asml.silence@gmail.com/

* Does io_uring do any batching behind the scene? Maybe with SQPOLL can I expect requests to be batched?

Yes, quite a lot, in different places and it's true not only for SQPOLL. The more requests you send per syscall and the more requests you have inflight, the more efficient (e.g. cycles per request) it is, with caveats like caching and so.

* What do you think it's more important for performance, fixed buffers or batching?

I'd guess it's batching in many cases, however if you shuffle lots of bytes per request, it may be registered buffers. Depends on requests, the inflight number, payload size and so on.

* It seems that fixed buffers for NVMe commands will be released with the [6.1 kernel](https://lore.kernel.org/io-uring/8bbcb3e9-118c-ea25-a906-24aa28a6c48c@kernel.dk/), right?

* Is there a way to guarantee that my ops will always use [`io_arm_poll_handler`](https://github.com/torvalds/linux/blob/c3eb11fbb826879be773c137f281569efce67aa8/io_uring/poll.c#L631) and hence not  spawn worker threads? Like if I only do NVMe ops, or O_DIRECT reads/writes, or socket ops, will there be a guarantee that no worker thread is required?

No guarantees but you can limit the number of io-wq workers. Note that NVMEs, block devices, fs files and so on are not pollable, they will never go to io_arm_poll_handler().

* If I understood correctly I should register files which I plan to access often for performance reasons, like `/dev/ng0n1` for NVMe commands. Does this apply for sockets too? Is there a limit, or can I register as many files as I want?

There are different trade-offs involved, but yes, long-lived frequently used sockets are a good target. You create the fixed file table in advance specifying the maximum number of files in it.

* Is there any plan to support AF_XDP sockets? or is there an equivalent in io_uring?

There are plans and work in progress for zerocopy receive.

* What are the plans for eBPF?

Was tried out before, it doesn't bring good enough results. I believe the reason is that io_uring is too close to the userspace to make much difference, especially with good submission batching

https://lore.kernel.org/io-uring/cover.1621424513.git.asml.silence@gmail.com/

axboe / liburing

io_uring questions for a tutorial #737