axboe / liburing

Library providing helpers for the Linux kernel io_uring support
MIT License
2.85k stars 402 forks source link

kernel warning in io_ring_exit_work #799

Closed bsbernd closed 1 year ago

bsbernd commented 1 year ago

I'm not sure if this is the right place to ask or if I should write to a kernel mailing list. I'm working on fuse patches (kernel and userspace) to support uring. I'm testing these patches with xfstests and these tests frequently trigger the trace below. So it waits for a completion event in that code - my initial assumption was that there are some conditions in which my code forgets to send io_uring_cmd_done() on unmount/daemon kill. Refactored the code, added sanity checks and internal fuse.ko WARN_ON() - still happens. I certainly don't exclude a bug in my code, but I'm not sure anymore. Also updated from 6.0 to 6.2-rc8 today - does not make a difference. Also updated to yesterdays liburing git - no difference either.

Any hint or idea?

Thanks a lot in advance, Bernd

[  808.801250] ------------[ cut here ]------------
[  808.802080] WARNING: CPU: 0 PID: 116 at io_uring/io_uring.c:2884 io_ring_exit_work+0x1c0/0x6ba
[  808.803222] Modules linked in: loop fuse(O) auth_rpcgss oid_registry nfsv4 nfs lockd grace edac_core virtio_balloon qemu_fw_cfg configfs sunrpc virtio_rng rng_core autofs4 ext4 crc16 mbcache jbd2 virtio_net net_failover ahci virtio_blk failover libahci libata scsi_mod bsg scsi_common virtio_pci virtio_pci_legacy_dev virtio_pci_modern_dev [last unloaded: fuse]
[  808.807062] CPU: 0 PID: 116 Comm: kworker/u4:5 Tainted: G        W  O       6.2.0-rc8 #6
[  808.808116] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.0-debian-1.16.0-4 04/01/2014
[  808.809300] Workqueue: events_unbound io_ring_exit_work
[  808.810072] RIP: 0010:io_ring_exit_work+0x1c0/0x6ba
[  808.810742] Code: ff 4c 89 ef e8 d9 d3 85 ff 48 89 ef e8 3a f1 ff ff 48 c7 c7 c0 fa 60 82 e8 a5 0d 64 ff 48 8b 05 9e 15 8e 00 48 39 04 24 79 08 <0f> 0b 41 bf 98 3a 00 00 48 8d 7b 60 4c 89 fe e8 64 82 06 00 48 85
[  808.812936] RSP: 0018:ffffc900013afc68 EFLAGS: 00010287
[  808.813659] RAX: 000000010001ec00 RBX: ffff888150a0c858 RCX: ffffffff81d2e51b
[  808.814553] RDX: dffffc0000000000 RSI: 0000000000000008 RDI: ffffffff8260fac0
[  808.815445] RBP: ffff888150a0c000 R08: 0000000000000001 R09: ffffc900013afbd7
[  808.816336] R10: fffff52000275f7a R11: 0000000000000001 R12: ffff888150a0c040
[  808.817254] R13: 0000000000000000 R14: ffff888150a0c3d0 R15: 0000000000003a98
[  808.818201] FS:  0000000000000000(0000) GS:ffff88881b000000(0000) knlGS:0000000000000000
[  808.819230] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  808.819974] CR2: 00007f7b74970570 CR3: 000000013bea6005 CR4: 00000000001706f0
[  808.820869] Call Trace:
[  808.821287]  <TASK>
[  808.821661]  ? lock_downgrade+0x340/0x340
[  808.825699]  ? io_uring_try_cancel_requests+0x54a/0x54a
[  808.827060]  ? lock_acquire+0xc8/0x3a0
[  808.828078]  ? lock_release+0x690/0x690
[  808.829119]  ? lock_downgrade+0x340/0x340
[  808.830027]  ? trace_hardirqs_on+0x2d/0x110
[  808.830636]  ? read_word_at_a_time+0xe/0x20
[  808.831219]  ? lockdep_hardirqs_on_prepare+0xe/0x200
[  808.831879]  process_one_work+0x57e/0xa20
bsbernd commented 1 year ago

I also see

115 ?        D      0:00 [kworker/u4:4+events_unbound]
116 ?        D      0:00 [kworker/u4:5+events_unbound]

with

 squeeze1:~# cat /proc/115/stack 
[<0>] io_ring_exit_work+0x1d4/0x6ba
[<0>] process_one_work+0x57e/0xa20
[<0>] worker_thread+0x2a9/0x6f0
[<0>] kthread+0x150/0x180
[<0>] ret_from_fork+0x1f/0x30

So the warning is absolutely right, I just don't know what I need to do to get out of it.

bsbernd commented 1 year ago

I removed io_uring_cmd_done() on stop - existing IORING_OP_URING_CMD SQE is not getting completed - results in D-state of events_unbound kworkers, with the above stack. So guess it my own code.