axboe / liburing

Library providing helpers for the Linux kernel io_uring support
MIT License
2.86k stars 402 forks source link

Non consistent behavior for multishot vs regular recv operations when bufring is empty #1275

Closed romange closed 3 days ago

romange commented 3 days ago

Suppose my io_uring_buf_ring has been depleted - all its buffers have been consumed by the application and not yet returned.

If I issue a regular io_uring_prep_recv operation with IOSQE_BUFFER_SELECT (i.e. try using my empty bufring), it will wait until the socket receives data, and only then will trigger completion ENOBUFS. It's good because I can fallback to either calling direct recv or use io_uring recv without polling, knowing the socket is not empty.

With multishot, however, the behavior is differrent. Once I submit io_uring_prep_recv_multishot request on the socket, it will immediately send ENOBUFS completion even though the socket does not have any data.

  1. It's suboptimal because by the time it has data maybe the bufring will have spare buffers.
  2. I can not follow up with a straight recv and must poll on a socket again.
romange commented 3 days ago

Tested on kernels 6.2 and 6.8.

axboe commented 3 days ago

Huh, they should behave the same - both pick a buffer upfront, and then just recycle it (or don't commit, if using provided buffer rings) if no data is available. So that's a bit surprising, it's literally the same code at that part of the issue chain, only after doing a recv does it become different. Both should return -ENOBUFS immediately.

You can probably work around this by setting IORING_RECVSEND_POLL_FIRST if you know it's empty, but like mentioned above, it doesn't make a lot of sense to me. I'll poke a bit and see what I get here.

axboe commented 3 days ago

Test app:

#include <fcntl.h>
#include <stdint.h>
#include <liburing.h>

#include <netinet/in.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/socket.h>
#include <time.h>
#include <unistd.h>

int main(int argc, char *argv[])
{
    struct io_uring_sqe *sqe;
    struct io_uring_cqe *cqe;
    struct io_uring ring;
    char buffer[64];
    int sockfd, ret;

    io_uring_queue_init(8, &ring, 0);

    sockfd = socket(AF_INET, SOCK_STREAM, IPPROTO_TCP);

    sqe = io_uring_get_sqe(&ring);
    io_uring_prep_recv(sqe, sockfd, buffer, sizeof(buffer), 0);
    sqe->flags |= IOSQE_BUFFER_SELECT;
    sqe->buf_group = 5;
    io_uring_submit(&ring);

    ret = io_uring_wait_cqe(&ring, &cqe);
    if (ret)
        return ret;

    printf("recv res=%d\n", cqe->res);
    io_uring_cqe_seen(&ring, cqe);

    sqe = io_uring_get_sqe(&ring);
    io_uring_prep_recv_multishot(sqe, sockfd, NULL, 0, 0);
    sqe->flags |= IOSQE_BUFFER_SELECT;
    sqe->buf_group = 5;
    io_uring_submit(&ring);

    ret = io_uring_wait_cqe(&ring, &cqe);
    if (ret)
        return ret;

    printf("multishot res=%d\n", cqe->res);
    io_uring_cqe_seen(&ring, cqe);
    return 0;
}

where we don't have a buffer group setup, so fail to pick a buffer, and the output:

axboe@m2max-kvm ~> ./test-recv
recv res=-105
multishot res=-105

So seems to follow what I outlined, but I'm curious if you'd see the same on your kernel running that. Because you really should.

romange commented 3 days ago

Oh, maybe IORING_RECVSEND_POLL_FIRST is the reason. For the bufselect recv I set it up with IORING_RECVSEND_POLL_FIRST by default. With multishot I did not bother. Could it be the reason?

romange commented 3 days ago

Yeah, now when you explained it, it sounds absolutely clear. The kernel must have a buffer before calling recv, so it must borrow it from the ring. But with IORING_RECVSEND_POLL_FIRST set, we won't call recv before data has arrived. Thanks!

axboe commented 3 days ago

Glad it got ironed out :-)