axboe / liburing

Library providing helpers for the Linux kernel io_uring support
MIT License
2.72k stars 393 forks source link

UDP GSO Sendmsg_zc returns EFAULT(bad address) sporadically #1067

Closed pyhd closed 3 months ago

pyhd commented 5 months ago
posix_memalign(send_ptr, 4096, NR * 4096)

io_uring_prep_recv_multishot
io_uring_submit();

while() {
    io_uring_wait_cqe();
    while (io_uring_peek_cqe() == 0) {
        reap & copy to send_ptr;
        io_uring_cqe_seen();
    }
    run_send();
}

void run_send() {
    for (i = 0, i < NR; i++) {
        msg[i].msg_iov = iov[i];
        msg[i].msg_iovlen = 1;
        iov[i][0].iov_base = send_ptr + 4096 * i;
        iov[i][0].iov_len = 1472;

        msg[i].msg_control = control[i];
        msg[i].msg_controllen = sizeof(control[i]);

        cm = CMSG_FIRSTHDR(&msg[i]);
        cm->cmsg_level = SOL_UDP;
        cm->cmsg_type = UDP_SEGMENT;
        cm->cmsg_len = CMSG_LEN(sizeof(uint16_t));
        *((uint16_t *) CMSG_DATA(cm)) = 1472;

        io_uring_prep_sendmsg_zc();
    }

    io_uring_submit();
    cqes = NR;
    for (i = 0, i < cqes; i++) {
        io_uring_wait_cqe();
        if (cqe->res < 0)
            err();
        if (!(cqe->flags & IORING_CQE_F_NOTIF)) {
            if (cqe->flags & IORING_CQE_F_MORE)
                cqes++;
         }
         io_uring_cqe_seen();
    }
}

Kernel 6.7.2 UDP GSO cmsg + sendmsg works fine. Without cmsg + sendmsg_zc works fine. But UDP GSO cmsg + sendmsg_zc will return EFAULT from time to time, not every packet.

axboe commented 5 months ago

@isilence Can you take a look?

pyhd commented 5 months ago

Here is a GSO_ZC_EFAULT.log.

isilence commented 5 months ago

Do you have a full reproducer? The chances are either one of the pointers doesn't live long enough or there is some kind of problem with retries and how it copies msghdr / cmsg.

isilence commented 4 months ago

@pyhd, do you have a reproducer for the issue?

pyhd commented 4 months ago

@pyhd, do you have a reproducer for the issue?

Sorry, I am afraid not.

isilence commented 3 months ago

@pyhd, [1] should fix it, it'll get backported once it hits upstream. I'm closing it, please test it if you can and reopen if the problem is still there.

[1] https://git.kernel.dk/cgit/linux-block/commit/?h=io_uring-6.9&id=4fe82aedeb8a8cb09bfa60f55ab57b5c10a74ac4