axboe / fio

Flexible I/O Tester
GNU General Public License v2.0
5.01k stars 1.23k forks source link

CRC verify failed in multi job situations #1770

Closed xvanQ closed 3 weeks ago

xvanQ commented 1 month ago

Please acknowledge the following before creating a ticket

Description of the bug: fio fio_jobs_8GB_8parts_write_128KB_bklg1.fio job0: (g=0): rw=write, bs=(R) 128KiB-128KiB, (W) 128KiB-128KiB, (T) 128KiB-128KiB, ioengine=dev-dax, iodepth=1 job1: (g=0): rw=write, bs=(R) 128KiB-128KiB, (W) 128KiB-128KiB, (T) 128KiB-128KiB, ioengine=dev-dax, iodepth=1 fio-3.37-45-g6e039 Starting 2 threads verify: bad header offset 0, wanted 10737418240 at file /dev/dax0.0 offset 10737418240, length 131072 (requested block: offset=10737418240, length=131072) hdr_fail data dumped as dax0.0.10737418240.hdr_fail fio: pid=40695, err=84/file:io_u.c:2251, func=io_u_sync_complete, error=Invalid or incomplete multibyte or wide character crc32c: verify failed at file /dev/dax0.0 offset 131072, length 131072 (requested block: offset=131072, length=131072, flags=88) Expected CRC: af654b83 Received CRC: 4b31d646 received data dumped as dax0.0.131072.received expected data dumped as dax0.0.131072.expected fio: pid=40694, err=84/file:io_u.c:2251, func=io_u_sync_complete, error=Invalid or incomplete multibyte or wide character

job0: (groupid=0, jobs=2): err=84 (file:io_u.c:2251, func=io_u_sync_complete, error=Invalid or incomplete multibyte or wide character): pid=40694: Thu Jun 6 16:39:47 2024

Environment:

fio version: <3.37>

Reproduction steps Here is my config: [global] ioengine=dev-dax iodepth=1 direct=0 bs=128KB verify_backlog=8 filename=/dev/dax0.0 thread=1 verify=crc32c do_verify=1 verify_dump=1 verify_fatal=1 group_reporting [job0] offset=0M rw=write size=1024M [job1] offset=10240M rw=write size=1024M

vincentkfu commented 1 month ago

I don't have a DAX device but a job similar to yours using psync runs without issue.

[global]
#ioengine=dev-dax
iodepth=1
direct=0
#bs=128KB
verify_backlog=8
#filename=/dev/dax0.0
filename=test
thread=1
verify=crc32c
do_verify=1
verify_dump=1
verify_fatal=1
group_reporting
filesize=16M
number_ios=16

[job0]
offset=0M
rw=write
size=1024K

[job1]
#offset=10240M
offset=4M
rw=write
size=1024M

Study the output from running your job with --debug=io,verify (add something like number_ios=16 to limit the output).

xvanQ commented 3 weeks ago

I don't have a DAX device but a job similar to yours using psync runs without issue.

[global]
#ioengine=dev-dax
iodepth=1
direct=0
#bs=128KB
verify_backlog=8
#filename=/dev/dax0.0
filename=test
thread=1
verify=crc32c
do_verify=1
verify_dump=1
verify_fatal=1
group_reporting
filesize=16M
number_ios=16

[job0]
offset=0M
rw=write
size=1024K

[job1]
#offset=10240M
offset=4M
rw=write
size=1024M

Study the output from running your job with --debug=io,verify (add something like number_ios=16 to limit the output).

Thank you for your answer. I have resolved this issue. The problem is that using multi-threaded write verification on the same dax file can cause thread conflicts. The solution is to create multiple namespaces as needed and specify different file names in the job, such as/dev/dax0.0,/dev/dax0.1,/dev/dax0.2, etc.