Closed ZiqiangYoung closed 1 month ago
You're not telling fio to use polled IO with that job. You'll want to add
hipri=1
to the write-4KiB-Q1-J1
job section to do that. Without it, you're doing regular IRQ driven IO with that job.
Once you get polled IO working, registered buffers will further reduce latencies for O_DIRECT IO. You can use those in the fio job by setting
fixedbufs=1
in the job section. Fixed files will help a bit too, particularly real world where applications are threaded. For default fio, probably won't notice.
You're not telling fio to use polled IO with that job. You'll want to add
hipri=1
to the
write-4KiB-Q1-J1
job section to do that. Without it, you're doing regular IRQ driven IO with that job.
I just wanted to extend a big thanks to you! I've observed that the CPU utilization for the binding has finally hit 100%, and it’s clear that this time the changes have truly taken effect. The reduction in latency at QD1 matches exactly what I was expecting, which is fantastic.
Thank you so much again—you’re a real lifesaver!
Once you get polled IO working, registered buffers will further reduce latencies for O_DIRECT IO. You can use those in the fio job by setting
fixedbufs=1
in the job section. Fixed files will help a bit too, particularly real world where applications are threaded. For default fio, probably won't notice.
Your conclusion that For default fio, probably won't notice.
is correct. After testing, I found that setting this parameter in fio indeed does not yield any noticeable benefit. However, I’ll keep your conclusion in mind. Thanks again for your time!
Hello, I've been looking at the QD1 test method recently, I have some questions I want to turn to the developer. I found that NVMe command processing has MSI-X interrupt which to notify host working when CQ was wroten. So I expect polling to replace interrupts to reduce QD1 latency, then I found io_uring
For test, I set
nvme.poll_queues=64
which used to be 0, and bind cpu in specified core, and use io_uring engine in fio on fedora 40(kernel 6.10), However, modifying this parameter does not reduce the clat.I want to know what other related configuration items may reduce QD1 latency in io_uring or even libaio, thanks.
Below is the test script I wrote, which binding cpu on nvme0q32 which has no interrupt count in /proc/interrupt.
I have the ability to read and modify kernel source such as
blk_pull
ornvme_setup_irqs
etc. If you have any idea of reducing the QD1 latency even at the expense of other indicators, please let me know.Thank you.