axboe / liburing

Library providing helpers for the Linux kernel io_uring support
MIT License
2.72k stars 393 forks source link

Modify polling via IORING_OP_POLL_REMOVE -- unable to add/remove IORING_POLL_ADD_MULTI. #1099

Closed vsolontsov-ll closed 4 months ago

vsolontsov-ll commented 4 months ago

Hello,

First of all please pardon me if the question must be addressed to the Kernel rather than to the lib.

From the docs, sources and my tests it's clear that IORING_POLL_ADD_MULTI level-trigger logic is not implemented yet. So in my app I'm simulating it: once user code doesn't specify EPOLLET, the layer responsible for polling and dispatching requests issues IORING_OP_POLL_ADD without IORING_POLL_ADD_MULTI. At receiving such an event, it re-arms the polling. So far so good.

The issue appears when I try to simulate epoll_ctl(EPOLL_CTL_MOD) for changing EPOLLET polling to non-ET. But seems to be impossible to add EPOLLONESHOT bit into the subscription.

According to the kernel sources, that's how it behaves... When there's no IORING_POLL_ADD_MULTI, it adds EPOLLONESHOT into the events:


    if (!(flags & IORING_POLL_ADD_MULTI))
        events |= EPOLLONESHOT;

but then, within the io_pull_remove() it explicitly ignores higher bits including EPOLLONESHOT = 1u << 30:

poll->events &= ~0xffff;
poll->events |= poll_update->events & 0xffff;
poll->events |= IO_POLL_UNMASK;

It seems like EPOLLONESHOT, EPOLLET, EPOLLEXCLUSIVE, etc.. can only be set at initial request.

I'm not sure how to attribute this query. For me it's a sad surprise for which I need to either explicitly disable switching ET <-> non-ET or simulate the MOD via DEL+ADD. Would be great to have it fixed or documented though.

axboe commented 4 months ago

We can certainly fix that. Do you have a simple test case that we can use?

axboe commented 4 months ago

I think this should do it:

diff --git a/io_uring/poll.c b/io_uring/poll.c
index 5f779139cae1..721f42a14c3e 100644
--- a/io_uring/poll.c
+++ b/io_uring/poll.c
@@ -1030,8 +1030,7 @@ int io_poll_remove(struct io_kiocb *req, unsigned int issue_flags)
        if (poll_update->update_events) {
            struct io_poll *poll = io_kiocb_to_cmd(preq, struct io_poll);

-           poll->events &= ~0xffff;
-           poll->events |= poll_update->events & 0xffff;
+           poll->events = poll_update->events;
            poll->events |= IO_POLL_UNMASK;
        }
        if (poll_update->update_user_data)

as there should be no need to unmask and mask in the new events, we should just overwrite them. I'll get this tested. Do you have an identity + email that I can use for a Reported-by: tag in the fix?

axboe commented 4 months ago

Actually that isn't safe, as we have state in there. So I'm afraid you'll have to remove and rearm if you want to change exclusive etc settings. I'll update the man page.

vsolontsov-ll commented 4 months ago

Thank you so much. I assume the test case is not needed, pls let me know if I'm wrong -- I'll be happy to prepare one.

axboe commented 4 months ago

No test case needed, as it cannot work like that, unfortunately. I've pushed a documentation fix, closing.