axboe / liburing

Library providing helpers for the Linux kernel io_uring support
MIT License
2.86k stars 402 forks source link

Feature Request: API for Partial Wait (with Optional Timeout) for IO Completions #635

Closed nibanks closed 3 weeks ago

nibanks commented 2 years ago

io_uring_wait_cqes currently waits for all wait_nr IO completions. It would be nice to have a slightly different API that doesn't wait for all wait_nr but returns when any are available, and it returns all IO completions are currently available. I have tried to produce a similar behavior by using io_uring_peek_batch_cqe (see below), but in the "wait" case, it only ever returns one, instead of all available after wait.

uint32_t eventq_dequeue(struct io_uring* queue, struct io_uring_cqe** events, uint32_t count, uint32_t wait_time) {
    int result = io_uring_peek_batch_cqe(queue, events, count);
    if (result > 0 || wait_time == 0) return result;
    if (wait_time != UINT32_MAX) {
        struct __kernel_timespec timeout;
        timeout.tv_sec = (wait_time / 1000);
        timeout.tv_nsec = ((wait_time % 1000) * 1000000);
        result = io_uring_wait_cqe_timeout(queue, events, &timeout);
    } else {
        result = io_uring_wait_cqe(queue, events);
    }
    return result == 0 ? 1 : 0;
}

I'd really love to have a single function that has (essentially) the same signature as eventq_dequeue above.

axboe commented 2 years ago

I think you're confusing two things. The wait api does not return events to you, those you can just find and reap from user space. Those are two different operations. Sounds like what you want is just using 1 for the wait count, and then you just iterate completions when that returns.

nibanks commented 2 years ago

I am coalescing the (possible) wait and the return of completions, but I'm not sure if it's very efficient to do the following with the existing APIs (I'd be happy if I'm wrong!):

  1. Check if there are any completions.
  2. If so, return them.
  3. Else if wait == 0, return empty.
  4. Wait the specified time (possibly infinity).
  5. On wake, return all available completions.
axboe commented 2 years ago

The only expensive part in that list is the waiting on the events. Checking for events is just a memory read. So yes, that is the expected use case.

nibanks commented 2 years ago

So, should I update my function above to return io_uring_peek_batch_cqe(queue, events, count); instead of return result == 0 ? 1 : 0; Is that really optimal? Would/could it be more efficient to put all this into a single io_uring_* function?

axboe commented 2 years ago

I'm OOO today so only on the phone, hence haven't looked at your code at all. I'll check later.

CarterLi commented 2 years ago

I suggest

io_uring_submit_and_wait(&ring, 1);
io_uring_cqe *cqe;
unsigned head;
int cqe_count = 0;
io_uring_for_each_cqe(&ring, head, cqe) {
    ++cqe_count;
    /* use cqe here */
}
io_uring_cq_advance(&ring, cqe_count);
nibanks commented 2 years ago

Thanks for the suggestion @CarterLi but I am trying to implement an abstraction layer that works with multiple different IO models, on different platforms. That's what the eventq_dequeue function above is for.

axboe commented 2 years ago

Then just have the caller iterate and do the advance of the cq ring. Either that, or you'd need to copy the event which isn't ideal.

nibanks commented 2 years ago

What's the difference between io_uring_wait_cqe and io_uring_submit_and_wait? io_uring_wait_cqe also returns you the IO completion, while io_uring_wait_cqe just waits? What about the "submit" part? What exactly does that mean?

nibanks commented 2 years ago

And what about a io_uring_wait_cqe_timeout equivalent? I found io_uring_submit_and_wait_timeout but it takes the cqe_ptr and a sigmask too, so I'm not sure if that's what I should use.

CarterLi commented 2 years ago

What's the difference between io_uring_wait_cqe and io_uring_submit_and_wait? io_uring_wait_cqe also returns you the IO completion, while io_uring_wait_cqe just waits? What about the "submit" part? What exactly does that mean?

io_uring_submit_and_wait = io_uring_submit + io_uring_wait ( without returning cqe ) in one syscall io_uring_wait_cqe = io_uring_wait + for_each_cqe(cqe) { return cqe }

submit and wait both requires syscalls, which is expensive, while returning cqe ( the IO completion ) is only cheap memory reads.

io_uring_peek_batch_cqe copies entries in CQ to another buffer, which, IMO, is unnecessary and useless. Just use for_each_cqe

nibanks commented 2 years ago

Ok, so I don't need the submit, because it's assumed that was already done, possibly on a different thread. So I'm back to a peek, wait (possibly with timeout), peek model. Though I didn't know about io_uring_cq_advance so that's better than returning 1 at a time.

uint32_t eventq_dequeue(eventq* queue, eventq_cqe* events, uint32_t count, uint32_t wait_time) {
    int result = io_uring_peek_batch_cqe(queue, events, count);
    if (result > 0 || wait_time == 0) return result;
    if (wait_time != UINT32_MAX) {
        struct __kernel_timespec timeout;
        timeout.tv_sec = (wait_time / 1000);
        timeout.tv_nsec = ((wait_time % 1000) * 1000000);
        (void)io_uring_wait_cqe_timeout(queue, events, &timeout);
    } else {
        (void)io_uring_wait_cqe(queue, events);
    }
    return io_uring_peek_batch_cqe(queue, events, count);
}
void eventq_return(eventq* queue, uint32_t count) {
    io_uring_cq_advance(queue, count);
}

My proposed changes: https://github.com/nibanks/eventq/pull/7

axboe commented 3 weeks ago

This is available with the min-timeout interface.