Open NickiXsight opened 2 months ago
Since you already did the full analysis, care to send a fix for this?
The 1 and 2 are really simple, I can create a PR tomorrow. But 3 and 4 require more attention -- just fixing the lock-unlock scheme is not enough, I have to create a good test that proves that indeed overlap conflict may happen during the requeue, and then -- if I push the check_overlap into the requeue loop, what is the performance impact of all this? I'll do PR soon and we'll discuss it.
Please acknowledge the following before creating a ticket
Description of the bug: When running with huge FIO that involves multiple jobs with verify we run with serialize_overlap=1 then FIO aborts at some point with the error in the title.
After looking into the code I see 4 problems:
Environment: debian x86_64
fio version: fio-3.37-86-g7bc1
Reproduction steps [write-and-verify] rw=randwrite bs=4k direct=1 ioengine=libaio iodepth=128 verify=crc32c verify_backlog=100000 verify_dump=1 verify_fatal=1 verify_async=4 serialize_overlap=1 io_submit_mode=offload blocksize_range=4k-8k runtime=6000 size=512m numjobs=10 filename=/dev/nvme0n8:/dev/nvme0n7:/dev/nvme0n6:/dev/nvme0n5:/dev/nvme0n4:/dev/nvme0n3:/dev/nvme0n2:/dev/nvme0n1