Closed Cydox closed 2 months ago
Funky - that last link isn't valid to set (as there's nothing after it, hence it can't link to anything), but it should just be ignored. Not sure why it isn't, will take a look.
Ohh yes, I'm using IOSQE_IO_LINK
the wrong way around.
I'm mostly poking around the API trying to understand more about task_work and IORING_SETUP_DEFER_TASKRUN
/ IORING_SETUP_COOP_TASKRUN
. Are there any more resources on this besides the man pages or looking at the kernel source?
Check the doc in the wiki, that has more details on DEFER_TASKRUN:
https://github.com/axboe/liburing/wiki/io_uring-and-networking-in-2023
I suspect this should fix it:
diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index f3570e81ecb4..75f0087183e5 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -2579,9 +2579,9 @@ static int io_cqring_wait(struct io_ring_ctx *ctx, int min_events, u32 flags,
* If we got woken because of task_work being processed, run it
* now rather than let the caller do another wait loop.
*/
- io_run_task_work();
if (!llist_empty(&ctx->work_llist))
io_run_local_work(ctx, nr_wait);
+ io_run_task_work();
/*
* Non-local task_work will be run on exit to userspace, but
as we'll want to run task_work after having run our local task_work, in case the local task_work generates system wide task_work. Like the final put of a file, for example...
It does indeed fix it.
Excellent, just committed it and sent it to the list. If you want a Reported-by:
tag added to the commit, please reply in here with identity/name + email so I can add it. I already added the link to this bug report.
I also write a C reproducer which is standalone to test it. It's quite the odd construct to get there, I'm impressed ;-)
Nice, really fast turn-around!
for the tag: Jan Hendrik Farr kernel@jfarr.cc
Tag added:
and the test case pushed out as well.
Nice, really fast turn-around!
We take bug reports seriously :-)
I'll close this one up. Fix will go into the 6.12-rc1 release, but also marked for stable backport. It should find its way into 6.6/6.10/6.11 stable as well, other kernels either not affected or are end-of-life already. Thanks for the report with a reproducer!
If I do two send operations on a socket followed by a close on the file descriptor linked by
IOSQE_IO_LINK
the close operation never happens. All SQEs haveIOSQE_CQE_SKIP_SUCCESS
set. The ring is created withIORING_SETUP_DEFER_TASKRUN
.Already did a
git bisect
on the kernel and this is introduced in commit 846072f16eed3b3fb4e59b677f3ed8afb8509b89 https://github.com/torvalds/linux/commit/846072f16eed3b3fb4e59b677f3ed8afb8509b89Is it better to report this on the kernel mailing list, as it's not really an issue with liburing? I'm fine with using either the mailing list or github.
Interesting observations:
IOSQE_IO_LINK
orIOSQE_CQE_SKIP_SUCCESS
on the close, it works.IORING_SETUP_COOP_TASKRUN
instead of defer, it works.Reproducer written in Zig (let me know if you want a reproducer in C):
Can be built using zig 0.13:
To test simply connect using telnet:
Expected behavior is to get two messages from the server and then be disconnected. Observed behavior is getting the two messages and the connection not being closed.