Closed pajlada closed 3 months ago
Funky! Thanks for the report, I'll take a look.
Was able to reproduce the stall. Wow, libuv does some funky stuff.
Just to be clear, this is obviously an io_uring bug, regardless of what libuv does!
Awesome! Glad to hear you were able to reproduce it. Thank you!
Unsure if you're able to test kernel patches, but I believe the below should do it. Looks like we get into an inversion between the events workqueue being flushed for console output, and io_uring ring exits for some weird cases. If not, then I'll get it into 6.9-rc3 end of this week and it can bubble back to stable from there.
diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index 5d4b448fdc50..f6277e029d5f 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -147,6 +147,7 @@ static bool io_uring_try_cancel_requests(struct io_ring_ctx *ctx,
static void io_queue_sqe(struct io_kiocb *req);
struct kmem_cache *req_cachep;
+static struct workqueue_struct *iou_wq __ro_after_init;
static int __read_mostly sysctl_io_uring_disabled;
static int __read_mostly sysctl_io_uring_group = -1;
@@ -3161,7 +3162,7 @@ static __cold void io_ring_ctx_wait_and_kill(struct io_ring_ctx *ctx)
* noise and overhead, there's no discernable change in runtime
* over using system_wq.
*/
- queue_work(system_unbound_wq, &ctx->exit_work);
+ queue_work(iou_wq, &ctx->exit_work);
}
static int io_uring_release(struct inode *inode, struct file *file)
@@ -4185,6 +4186,8 @@ static int __init io_uring_init(void)
io_buf_cachep = KMEM_CACHE(io_buffer,
SLAB_HWCACHE_ALIGN | SLAB_PANIC | SLAB_ACCOUNT);
+ iou_wq = alloc_workqueue("iou_exit", WQ_UNBOUND, 64);
+
#ifdef CONFIG_SYSCTL
register_sysctl_init("kernel", kernel_io_uring_disabled_table);
#endif
Oh, and if you want a Reported-by: tag in the commit, please do let me know and I'll update it with that. Just need an identity + email for that. Queued up:
https://git.kernel.dk/cgit/linux/commit/?h=io_uring-6.9&id=d1a9cef84784f873b457f2622ca2415c4b4db748
A Reported-by tag would be appreciated yeah! My identity + email below:
Rasmus Karlsson <rasmus.karlsson@pajlada.com>
I'll see if I can test the patch, or if not test the rc3 kernel when that's released
Perfect, commit updated:
https://git.kernel.dk/cgit/linux/commit/?h=io_uring-6.9&id=e5444baa42e545bb929ba56c497e7f3c73634099
Just applied the patch to my 6.8.2 kernel on a system that previously experienced the issue and I can confirm that it fixes it. Thanks for the quick turnaround!
out of curiousity, why did all shells freeze, but ssh didn't?
Perfect, commit updated:
https://git.kernel.dk/cgit/linux/commit/?h=io_uring-6.9&id=e5444baa42e545bb929ba56c497e7f3c73634099
I tested this on top of arch linux kernel 6.8.2-arch2-2
and it seems to work. You have my permission to add Tested-by: Iskren Chernev <me@iskren.info>
out of curiousity, why did all shells freeze, but ssh didn't?
For me ssh was freezing too. But I also failed to kill the nvim after it froze... maybe there are a few variations of this.
Thanks everyone, patch will go upstream later this week, and it'll bubble back to -stable post that. Marking this one as closed as fix exists.
Hi! When running Arch Linux or Fedora Rawhide and suspending two instances of Neovim, which uses libuv, which uses io_uring, I experience a system freeze. It stops me from typing anything in any shell, or spawn any new shell, but I'm able to run some simple commands over ssh (e.g.
ssh myserver ls -la
). dmesg doesn't report anything interesting as far as I could tell, other than some of the apps that were running not being responsive.The freeze doesn't occur after disabling io_uring in libuv using
UV_USE_IO_URING=0
or in the kernel withsysctl kernel.io_uring_disabled=1
uname -a
from the tested systemsLinux billy 6.6.23-1-lts #1 SMP PREEMPT_DYNAMIC Wed, 27 Mar 2024 07:47:20 +0000 x86_64 GNU/Linux
running Arch LinuxLinux yolen 6.8.2-arch2-1 #1 SMP PREEMPT_DYNAMIC Thu, 28 Mar 2024 17:06:35 +0000 x86_64 GNU/Linux
running Arch LinuxLinux localhost 6.9.0-0.rc1.20240329git317c7bc0ef03.20.fc41.x86_64 #1 SMP PREEMPT_DYNAMIC Fri Mar 29 14:04:53 UTC 2024 x86_64 GNU/Linux
running Fedora RawhideReproduction steps
The suspension can be done in separate shells, or as different users with the same results.
Video showing off the freeze
https://github.com/axboe/liburing/assets/962989/f405bd56-28b8-4a2f-a4b5-3de7d8023010
I'm still able to run certain apps on the system, but not open a shell
If the io-uring@vger.kernel.org email is a better place for this report let me know and I'll report it there instead. Originally reported in https://github.com/libuv/libuv/issues/4377