axboe / liburing

Library providing helpers for the Linux kernel io_uring support
MIT License
2.7k stars 393 forks source link

Question about IORING_OP_NOP #1154

Closed Mulling closed 1 month ago

Mulling commented 1 month ago

Sorry if it's the wrong place to ask this, but.

I'm trying to use uring for shared memory IPC, and I use IORING_OP_NOP to send the entry of a shared memory pool to another process, ~which has the completion que mapped by means of IORING_SETUP_ATTACH_WQ~.

Submission code:

#define fence() __asm__ __volatile__("" ::: "memory")

register __u32 tail = 0;
register __u32 index = 0;

tail = *h->sr.tail;

fence();

index = tail & *h->sr.ring_mask;

h->sr.sqes[index].fd = -1;
h->sr.sqes[index].flags = 0;
h->sr.sqes[index].opcode = IORING_OP_NOP;
h->sr.sqes[index].addr = 0;
h->sr.sqes[index].len = 0;
h->sr.sqes[index].off = 0;
h->sr.sqes[index].user_data = e // Will be 0;

h->sr.array[index] = index;

tail++;

if (*h->sr.tail != tail) {
    *h->sr.tail = tail;
    fence();
}

__u64 qed =
    (__s64)tail - (__s64)__atomic_load_n(h->sr.head, __ATOMIC_ACQUIRE);

if (qed < 32) return 0;

if (io_uring_enter(h->fd, qed, 0, IORING_ENTER_GETEVENTS) == -1)
    die("io_uring_enter");

return 0;

Completion part:

register __u32 head = *h->cr.head;

do {
    fence();

    if (head == *h->cr.tail) break;

    struct io_uring_cqe* cqe = &h->cr.cqes[head & *h->cr.ring_mask];

    cb(h, cqe->user_data) // Just sends the user_data to a callback;

    head++;

} while (true);

*h->cr.head = head;

fence();

The callback (cb) is called when a completion event comes in, if I don't do anything (i.e. don't don't print its value) it works fine. But if I try to print the entry in the shared memory pool (not a problem with the shared memory itself) it causes the uring to loose events after while, probably because it takes longer complete, I'm not sure. ~Somehow -- I think -- the kernel is loosing the entries when they are not dequed fast enough, or could be a problem with my batching. I've asserted that the submission ring never gets full, and things are not overwritten~.

Not sure also if there is an ordering problem where the kernel or user-space is not seeing something it was supposed to see.

Thanks!

Mulling commented 1 month ago

The Kernel does set the overflow flag, but io_cqring_offsets.overflow is always 0.

axboe commented 1 month ago

The Kernel does set the overflow flag, but io_cqring_offsets.overflow is always 0.

The overflow count indicates events that the kernel dropped, and hence will never get posted to the CQ ring. This is an artifact of old kernels, recent kernels will always post them eventually unless things are really messed up (like the kernel being totally out of memory). This is why you don't see the overflow count increase.

That said, you should avoid getting into overflow situations, whether or not events get dropped. It's generally either a sign of the CQ ring being too small, or the app not processing completions properly.

Mulling commented 1 month ago

I'm running 6.9.0. Could not find anything weird in the kernel. I've tested the same setup with liburing and was unable to repo, so there something wrong with my implementation.

Sorry to bother, thanks for the help!

Mulling commented 1 month ago

Just thought I'd update this if anyone runs into a similar situation.

Because I'm running two different programs (each ring is in a different program), the completion side cannot drive the runtime forward ~(not sure if it should, but, IORING_SETUP_ATTACH_WQ gives the idea that it should)~ so we get in a situation where the kernel "keeps" all the events that have overflown, and if the submission side blocks, it gets "stuck". That means that the submission side needs to enter the kernel from time to time, so progress can be made.

isilence commented 1 month ago

completion que mapped by means of IORING_SETUP_ATTACH_WQ

That sentence doesn't make much sense. IORING_SETUP_ATTACH_WQ has nothing to do with completion queue, mappings and completions in general. It works as a performance hint without much of an exposure to the userspace.

Mulling commented 1 month ago

Hmm... makes sense.

Mulling commented 1 month ago

Because I'm running two different programs (each ring is in a different program), the completion side cannot drive the runtime forward (not sure if it should, but, IORING_SETUP_ATTACH_WQ gives the idea that it should) so we get in a situation where the kernel "keeps" all the events that have overflown, and if the submission side blocks, it gets "stuck". That means that the submission side needs to enter the kernel from time to time, so progress can be made.

This makes no sense also, because I was trying to enter the kernel with the wrong fd.

So basically this all boils down to a wrong understanding of how IORING_SETUP_ATTACH_WQ works. Thanks for clarifying!