Closed nibanks closed 3 weeks ago
I think you're confusing two things. The wait api does not return events to you, those you can just find and reap from user space. Those are two different operations. Sounds like what you want is just using 1 for the wait count, and then you just iterate completions when that returns.
I am coalescing the (possible) wait and the return of completions, but I'm not sure if it's very efficient to do the following with the existing APIs (I'd be happy if I'm wrong!):
The only expensive part in that list is the waiting on the events. Checking for events is just a memory read. So yes, that is the expected use case.
So, should I update my function above to return io_uring_peek_batch_cqe(queue, events, count);
instead of return result == 0 ? 1 : 0;
Is that really optimal? Would/could it be more efficient to put all this into a single io_uring_*
function?
I'm OOO today so only on the phone, hence haven't looked at your code at all. I'll check later.
I suggest
io_uring_submit_and_wait(&ring, 1);
io_uring_cqe *cqe;
unsigned head;
int cqe_count = 0;
io_uring_for_each_cqe(&ring, head, cqe) {
++cqe_count;
/* use cqe here */
}
io_uring_cq_advance(&ring, cqe_count);
Thanks for the suggestion @CarterLi but I am trying to implement an abstraction layer that works with multiple different IO models, on different platforms. That's what the eventq_dequeue
function above is for.
Then just have the caller iterate and do the advance of the cq ring. Either that, or you'd need to copy the event which isn't ideal.
What's the difference between io_uring_wait_cqe
and io_uring_submit_and_wait
? io_uring_wait_cqe
also returns you the IO completion, while io_uring_wait_cqe
just waits? What about the "submit" part? What exactly does that mean?
And what about a io_uring_wait_cqe_timeout
equivalent? I found io_uring_submit_and_wait_timeout
but it takes the cqe_ptr
and a sigmask
too, so I'm not sure if that's what I should use.
What's the difference between
io_uring_wait_cqe
andio_uring_submit_and_wait
?io_uring_wait_cqe
also returns you the IO completion, whileio_uring_wait_cqe
just waits? What about the "submit" part? What exactly does that mean?
io_uring_submit_and_wait
= io_uring_submit
+ io_uring_wait
( without returning cqe ) in one syscall
io_uring_wait_cqe
= io_uring_wait
+ for_each_cqe(cqe) { return cqe }
submit and wait both requires syscalls, which is expensive, while returning cqe ( the IO completion ) is only cheap memory reads.
io_uring_peek_batch_cqe
copies entries in CQ to another buffer, which, IMO, is unnecessary and useless. Just use for_each_cqe
Ok, so I don't need the submit, because it's assumed that was already done, possibly on a different thread. So I'm back to a peek, wait (possibly with timeout), peek model. Though I didn't know about io_uring_cq_advance
so that's better than returning 1 at a time.
uint32_t eventq_dequeue(eventq* queue, eventq_cqe* events, uint32_t count, uint32_t wait_time) {
int result = io_uring_peek_batch_cqe(queue, events, count);
if (result > 0 || wait_time == 0) return result;
if (wait_time != UINT32_MAX) {
struct __kernel_timespec timeout;
timeout.tv_sec = (wait_time / 1000);
timeout.tv_nsec = ((wait_time % 1000) * 1000000);
(void)io_uring_wait_cqe_timeout(queue, events, &timeout);
} else {
(void)io_uring_wait_cqe(queue, events);
}
return io_uring_peek_batch_cqe(queue, events, count);
}
void eventq_return(eventq* queue, uint32_t count) {
io_uring_cq_advance(queue, count);
}
My proposed changes: https://github.com/nibanks/eventq/pull/7
This is available with the min-timeout interface.
io_uring_wait_cqes
currently waits for allwait_nr
IO completions. It would be nice to have a slightly different API that doesn't wait for allwait_nr
but returns when any are available, and it returns all IO completions are currently available. I have tried to produce a similar behavior by usingio_uring_peek_batch_cqe
(see below), but in the "wait" case, it only ever returns one, instead of all available after wait.I'd really love to have a single function that has (essentially) the same signature as
eventq_dequeue
above.