axboe / liburing

Library providing helpers for the Linux kernel io_uring support
MIT License
2.81k stars 399 forks source link

A Clarification on IO URING's Usage #270

Closed GuacheSuede closed 2 years ago

GuacheSuede commented 3 years ago

Logically, One would expect IO URING to have a ring buffer to only handle input and output operations.

Theoretically, would it be possible for IO Uring include non IO based functionality/ custom functions ?

Regards, Guache

isilence commented 3 years ago

Logically, One would expect IO URING to have a ring buffer to only handle input and output operations.

It's not only about pure I/O like hardware/devices communication, for instance, futex wait is a good candidate because it can sleep and doing it synchronously is not always great. However, unless there is a really good reason I would prefer to not add (and maintain it) functionality that can be called synchronously without impact, e.g. socket().

Theoretically, would it be possible for IO Uring include non IO based functionality/ custom functions ?

There is WIP for inject ebpf programs, but that's more like (smart) flow control composing from operations that we already have. What do you have in mind? Custom modules?

isilence commented 3 years ago

Jens also threw a long ago a good idea of allowing other kernel parts to provide custom operations (i.e. callback in file_operations)

axboe commented 3 years ago

Jens also threw a long ago a good idea of allowing other kernel parts to provide custom operations (i.e. callback in file_operations)

I still think that's a good idea, and I did in fact implement it for doing passthrough kind of commands for NVMe. But it could be used for anything, with a private opcode being used that is meaningful to the file being used. Anyway, patch is around somewhere, but nobody cared enough about the feature yet for me to push it any further so far.

isilence commented 3 years ago

I still think that's a good idea,

I put into words awkwardly. I mean "You suggested a good idea ...".

and I did in fact implement it for doing passthrough kind of commands for NVMe.But it could be used for anything, with a private opcode being used that is meaningful to the file being used. Anyway, patch is around somewhere, but nobody cared enough about the feature yet for me to push it any further so far.

Same here. About users, e.g. #98 may be be implemented with it.

isilence commented 3 years ago

BTW, I used once your idea for NVMe as well. zns operations like reset in particular. And this case may be really interesting.

axboe commented 3 years ago

Let me resurrect it and then we can bounce it around a bit until it becomes something palatable for upstream.

axboe commented 3 years ago

Another use case would be more efficient raw io on nvme, might be interesting to see how much efficiency we can claw back from that.

axboe commented 3 years ago

Here's a start:

https://git.kernel.dk/cgit/linux-block/log/?h=io_uring-fops

axboe commented 3 years ago

Totally untested, not even compiled :-)

isilence commented 3 years ago

Cool! Let me play with zbd/zns/nvme on that idea.

isilence commented 3 years ago

How about making variadic size pdu? e.g. user-specified via sqe->len. If too large put it into async_data, otherwise hot path is first 48B of kiocb. That's more flexible and may also save us some copies (e.g. pdu 8B).

isilence commented 3 years ago

p.s. too early for details, but IMHO better to not mask out personality. + keeps it out from hot (no personality) path.

diff --git a/fs/io_uring.c b/fs/io_uring.c
index a678920b1c8d..33e6e19ebdb2 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -6753,6 +6753,8 @@ static int io_init_req(struct io_ring_ctx *ctx, struct io_kiocb *req,
    if (id) {
        struct io_identity *iod;

+       if (io_op_defs[req->opcode].no_personality)
+           return -EINVAL;
        iod = idr_find(&ctx->personality_idr, id);
        if (unlikely(!iod))
            return -EINVAL;
axboe commented 3 years ago

How about making variadic size pdu? e.g. user-specified via sqe->len.

But that already eats way into how much is available for the hot path. My plan was to just make something that needs more than 48 bytes passed in via pointers, we'd need to copy it anyway for deferral/async. But definitely open to ideas, this is just one approach. I do want the fast path to 1) not grow io_kiocb, which means a max of 48 bytes (unless we just export the done function and kill the ->done pointer), and 2) not allocate

axboe commented 3 years ago

I like your personality change, let me fold that in right now.

isilence commented 3 years ago

I'll sleep on it. We may want to ask a file for work_flags it wants to use, or some other controlling

GuacheSuede commented 3 years ago

Logically, One would expect IO URING to have a ring buffer to only handle input and output operations.

It's not only about pure I/O like hardware/devices communication, for instance, futex wait is a good candidate because it can sleep and doing it synchronously is not always great. However, unless there is a really good reason I would prefer to not add (and maintain it) functionality that can be called synchronously without impact, e.g. socket().

Theoretically, would it be possible for IO Uring include non IO based functionality/ custom functions ?

There is WIP for inject ebpf programs, but that's more like (smart) flow control composing from operations that we already have. What do you have in mind? Custom modules?

Yes, custom modules such as computational tasks. Could possibly remove the need for external thread management

GuacheSuede commented 3 years ago

Jens also threw a long ago a good idea of allowing other kernel parts to provide custom operations (i.e. callback in file_operations)

Could this be extended to have user defined custom operations defined within the program where iouring is used/invoked ?