loop device detach (losetup -d) hung

Hi, I've encountered an issue when using losetup -d to detach a loop device, it hangs. Here are the steps to reproduce:

Create a loop device:

dd if=/dev/zero of=./x.img count=400 bs=1M
LOOP_DEVICE=$(losetup --find --show --partscan ./x.img) && echo $LOOP_DEVICE
mkfs.ext4 -F $LOOP_DEVICE
mkdir -p /mnt/tests/ && mount $LOOP_DEVICE /mnt/tests/

Set up a snapshot: dbdctl setup-snapshot $LOOP_DEVICE /mnt/tests/.cow 0
Destroy the snapshot: dbdctl destroy 0
Unmount the device: umount /mnt/tests
Detach the loop device (Hungs here): losetup -d $LOOP_DEVICE

I've used gdb to debug the kernel and found that the root cause is when detaching the loop device. If no one else is using it, the kernel (loop_clr_fd in loop.c) calls the __loop_clr_fd function internally. This function then calls blk_mq_freeze_queue, where the hang occurs.

The reason for the hang is due to abnormal ref count changes in the request queue of the loop device. Here is the image

In the second red box, it can be seen that the value of lo->lo_queue->q_usage_counter->data inexplicably increased from 1 to 22. This is very strange. I experimented a few times and found that sometimes it increases to over 100. This results in the inability to freeze lo->lo_queue.

I suspect this issue might be related to changes in the kernel loop device. Two commits seem particularly relevant, but i am not sure the root cause is related with them Commit 1 Commit 2

Additionally, this situation only occurs when we perform setup & destroy & umount before detaching, leading to a hang. If we follow the sequence setup -> destroy -> detach -> umount, or setup -> umount -> detach -> destroy, the losetup -d command won't result in a hang. This is because our module is still using the loop device, so it doesn't call __loop_clr_fd in loop_clr_fd .

And it may affect kernel versions 5.16 and above, confirmed on Fedora 34 (5.16.19 / 5.17.12) and Fedora 38 (6.2).

However, this error does not seem to affect physical disks but not sure will effect the ref cnt for request queue of disk.

datto / dattobd

loop device detach (losetup -d) hung #356