axboe / liburing

Library providing helpers for the Linux kernel io_uring support
MIT License
2.89k stars 407 forks source link

When run mixed read/write with block size 1M for sata drives, io_uring report -11 in cqe->res. #1269

Closed suxinggm closed 3 weeks ago

suxinggm commented 1 month ago

When I use 50/50 R/W for the SATA IO test, about 60 hours, all the drives(8 pcs) in this server will report cqe->res = -11.

Then I try to run fio with "ioengine=io_uring bs=1M rw=read" the drive still report "Resource Temporary Unavalible" in a few seconds.

After that, I try to run fio with "ioengine=io_uring bs=512K rw=read", the drive will be back to normal.

I found another way when the engine in abnormal state. Run fio with "ioengine=libaio bs=1M rw=read", the drive also can be back to normal.

Linux kernel version : 5.19.17 liburing version : 2.6

I think this may be related to io_uring and block layer, but I don't know how to debug it and found the root cause.

If anyone could help to give some suggestion? Thanks alot.

axboe commented 1 month ago

Your kernel is likely too old, please try with a 6.x based kernel.

Edit: should expand... For older kernels, if you get -EAGAIN for such a read/write, then the application needs to reissue the IO. For newer kernels, the kernel will deal with it and you should not see -EAGAIN bubbling back up to the application.

suxinggm commented 1 month ago

Your kernel is likely too old, please try with a 6.x based kernel.

Edit: should expand... For older kernels, if you get -EAGAIN for such a read/write, then the application needs to reissue the IO. For newer kernels, the kernel will deal with it and you should not see -EAGAIN bubbling back up to the application.

Appreciate ~~~ Axboe, I will try it with new kernel again.