axboe / fio

Flexible I/O Tester
GNU General Public License v2.0
5.26k stars 1.26k forks source link

Request to add option "pi_verify" for io_uring_cmd engine #1661

Open Xiaomingsj opened 1 year ago

Xiaomingsj commented 1 year ago

Description of the new feature

Hi, When using fio with io_uring_cmd engine to test IOPS on NVMe data with PI, found that it would be better to add an option "pi_verify" to get the driver's real performance numbers. For example, when I tested a LBA format 4K + 8-B metadata (16-bit CRC), in DIF mode, PRACT=0, if PRCHK=0, the IOPS would be 2.6M, if PRCHK=6, the IOPS would be dropped to 845k, since when PRCHK has PI LBG and LBAT check enabled, the fio would check every block's guard crc and AppTag, so the IOPS was extreme low. After adding a "pi_verify=0" to bypass PI data verifying on fio side, but still keeping PI data checked on device side and transferring on the bus, the IOPS would be 2.4M. Following are the changes I made for your information: in io_uring.c: { .name = "pi_verify", .lname = "Verify pi Data", .type = FIO_OPT_INT, .off1 = offsetof(struct ioring_options, pi_verify), .def = "1", .help = "fio verifies PI data (LBG, LBAT, LBRT) from read data (Default: 1)", .category = FIO_OPT_C_ENGINE, .group = FIO_OPT_G_IOURING, }, ... if (o->cmd_type == FIO_URING_CMD_NVME) { data = FILE_ENG_DATA(io_u->file); if (data->pi_type && (io_u->ddir == DDIR_READ) && !o->pi_act) { if (o->pi_verify) { ret = fio_nvme_pi_verify(data, io_u); if (ret) io_u->error = ret; } } }

The fio command line I used for testing are like below: ... --pi_chk=GUARD,APPTAG --apptag=0x55aa --apptag_mask=0xffff --pi_verify=0 --pi_act=0 ....

Thanks

Xiaomingsj commented 1 year ago

The structure ioring_options also changed as: struct ioring_options { struct thread_data td; unsigned int hipri; struct cmdprio_options cmdprio_options; unsigned int fixedbufs; unsigned int registerfiles; unsigned int sqpoll_thread; unsigned int sqpoll_set; unsigned int sqpoll_cpu; unsigned int nonvectored; unsigned int uncached; unsigned int nowait; unsigned int force_async; unsigned int md_per_io_size; unsigned int pi_act; unsigned int apptag; unsigned int apptag_mask; unsigned int pi_verify; //New added unsigned int prchk; char pi_chk; enum uring_cmd_type cmd_type; };

axboe commented 1 year ago

Can you turn that into a patch / PR? You already did all the work, and that would make our life easier.

ankit-sam commented 1 year ago

Hi @Xiaomingsj I tested and you are right the existing CRC generation in fio is quite slow. Even for write requests when we have --pi_chk=GUARD it is slow.

I have a PR which enables fio to use intel ISA-L, which does fast CRC. https://github.com/axboe/fio/pull/1662 Can you please try it out, during build fio will check if isa-l is present, else fio will fallback to existing slower CRC generator.

And as Jens suggested please send a patch or create PR for pi_verify option. You will have to update the docs (HOWTO.rst and fio.1) for this new option.

Xiaomingsj commented 1 year ago

Hi @ankit-sam, I have tested fio IOPS with your PR of ISA-L, the results are very promising. In 4K random read tests either in DIF or DIX mode with --pi_act=0, --pi_chk=GUARD,APPTAG, the IOPS are only ~2K or 3K less than the IOPS using --pi_verify=0 option. For Random write, the IOPS are also good,

Since --pi_verify=0 only works for reads, do you think it is still necessary to add it in ?

Xiaomingsj commented 1 year ago

Hi @ankit-sam , BTW, ISA-L crc used in your patch only supports 16-b crc ?

ankit-sam commented 1 year ago

Since --pi_verify=0 only works for reads, do you think it is still necessary to add it in ?

It may still be helpful for someone who is not using isa-l.

ankit-sam commented 1 year ago

Hi @ankit-sam , BTW, ISA-L crc used in your patch only supports 16-b crc ?

Yes as of now it only supports 16-b crc. Although README for isa-l says that it has support for rocksoft64 polynomial.

Xiaomingsj commented 1 year ago

Hi @ankit-sam , BTW, ISA-L crc used in your patch only supports 16-b crc ?

Yes as of now it only supports 16-b crc. Although README for isa-l says that it has support for rocksoft64 polynomial.

Just verified that 64-b crc bytes generated by crc64_rocksoft_refl function in isa-l match with test vectors in NVMe specification. I read the fio-blog "Fio end to end protection", it says crc32 and storage tag are not supported in fio io_uring_cmd engine due to kernel, may I ask why these features depend on kernel? as they are transferred withing 16-byte or more metadata.

vincentkfu commented 1 year ago

For the initial set of patches we focused on the simpler cases of 16-bit CRC values and 64-bit CRC values without storage tags. 32-bit CRC values require storage tag support. Supporting storage tags requires variable sized storage and reference tag fields which are more difficult to implement. Eventually these may be added but the initial implementation is sufficient for our current needs.

None of this depends on the kernel.