Closed donny372 closed 4 years ago
Please include the job file you are running, and any necessary details on how to reproduce.
I've tried both header and data failures with 3.16 and current -git, and I'm getting verify failures logged just fine. So you'll need to provide more information here. Also see:
https://git.kernel.dk/cgit/fio/tree/REPORTING-BUGS
for how to write a good bug report.
@donny372 is it possible for you follow up to @axboe's request? Just out of interest, you weren't running a rw=rw/randrw job with verification were you?
My steps to run the test for corrupt HEADER:
We should see some error message like: verify: bad magic header 0, wanted acca at file /dev/hdm offset 2097152, length 4096
Corrupt DATA is similar to corrupt header, just change step 2: use some tool to writesector 0x1010 /dev/hdm And the error message should be: md5: verify failed at file /dev/hdm offset 2097152, length 4096 (requested block: offset=2097152, length=4096) Expected CRC: e2c1cad45e5f670ee41147bb177ee79f Received CRC: dd3183fbaff8b9f71864e2bf678d6afa
@donny372
$ ./fio --minimal --name=write --rw=write --filename=fio.tmp --bs=4k --size=8k --offset=0x1000 --verify=md5 --verify_pattern=0xff
3;fio-3.16-65-gfd1ed-dirty;write;0;0;8;4000;1000;2;0;0;0.000000;0.000000;14;803;409.079000;557.546626;1.000000%=14;5.000000%=14;10.000000%=14;20.000000%=14;30.000000%=14;40.000000%=14;50.000000%=14;60.000000%=806;70.000000%=806;80.000000%=806;90.000000%=806;95.000000%=806;99.000000%=806;99.500000%=806;99.900000%=806;99.950000%=806;99.990000%=806;0%=0;0%=0;0%=0;16;804;410.043000;557.248227;0;0;0.000000%;0.000000;0.000000;8;8000;2000;1;0;0;0.000000;0.000000;10;65;37.746500;38.980676;1.000000%=10;5.000000%=10;10.000000%=10;20.000000%=10;30.000000%=10;40.000000%=10;50.000000%=10;60.000000%=65;70.000000%=65;80.000000%=65;90.000000%=65;95.000000%=65;99.000000%=65;99.500000%=65;99.900000%=65;99.950000%=65;99.990000%=65;0%=0;0%=0;0%=0;59;132;96.022500;51.343730;0;0;0.000000%;0.000000;0.000000;0.000000%;0.000000%;1;0;46;100.0%;0.0%;0.0%;0.0%;0.0%;0.0%;0.0%;0.00%;0.00%;0.00%;50.00%;0.00%;25.00%;0.00%;0.00%;0.00%;25.00%;0.00%;0.00%;0.00%;0.00%;0.00%;0.00%;0.00%;0.00%;0.00%;0.00%;0.00%;0.00%
$ hexdump -C fio.tmp
00000000 c3 bf 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000010 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
00001000 ca ac 02 00 00 10 00 00 00 00 00 00 00 00 00 00 |................|
00001010 00 10 00 00 00 00 00 00 43 eb 22 00 43 26 a0 24 |........C.".C&.$|
00001020 01 00 00 00 e0 55 39 d1 e2 c1 ca d4 5e 5f 67 0e |.....U9.....^_g.|
00001030 e4 11 47 bb 17 7e e7 9f ff ff ff ff ff ff ff ff |..G..~..........|
00001040 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff |................|
*
00002000 ca ac 02 00 00 10 00 00 00 00 00 00 00 00 00 00 |................|
00002010 00 20 00 00 00 00 00 00 43 eb 22 00 30 85 a2 24 |. ......C.".0..$|
00002020 01 00 01 00 ff e4 dd f3 e2 c1 ca d4 5e 5f 67 0e |............^_g.|
00002030 e4 11 47 bb 17 7e e7 9f ff ff ff ff ff ff ff ff |..G..~..........|
00002040 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff |................|
*
00003000
So byte 0x1010 (4112) is all zeros already and "changing" it to zero is just writing the data that's already there (which by definition is not corruption!). Let's write all ones instead:
$ LC_ALL=C tr '\0' '\377' < /dev/zero | dd of=fio.tmp bs=1 seek=4112 conv=notrunc count=1
1+0 records in
1+0 records out
1 bytes transferred in 0.002092 secs (478 bytes/sec)
$ hexdump -C fio.tmp
00000000 c3 bf 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000010 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
00001000 ca ac 02 00 00 10 00 00 00 00 00 00 00 00 00 00 |................|
00001010 ff 10 00 00 00 00 00 00 43 eb 22 00 43 26 a0 24 |........C.".C&.$|
00001020 01 00 00 00 e0 55 39 d1 e2 c1 ca d4 5e 5f 67 0e |.....U9.....^_g.|
00001030 e4 11 47 bb 17 7e e7 9f ff ff ff ff ff ff ff ff |..G..~..........|
00001040 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff |................|
*
00002000 ca ac 02 00 00 10 00 00 00 00 00 00 00 00 00 00 |................|
00002010 00 20 00 00 00 00 00 00 43 eb 22 00 30 85 a2 24 |. ......C.".0..$|
00002020 01 00 01 00 ff e4 dd f3 e2 c1 ca d4 5e 5f 67 0e |............^_g.|
00002030 e4 11 47 bb 17 7e e7 9f ff ff ff ff ff ff ff ff |..G..~..........|
00002040 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff |................|
*
00003000
OK let's verify:
$ ./fio --minimal --name=read --rw=read --filename=fio.tmp --bs=4k --size=8k --offset=0x1000 --verify=md5 --verify_pattern=0xff
verify: bad header offset 4351, wanted 4096 at file fio.tmp offset 4096, length 4096 (requested block: offset=4096, length=4096)
fio: pid=16526, err=92/file:io_u.c:2031, func=io_u_sync_complete, error=Illegal byte sequence
3;fio-3.16-65-gfd1ed-dirty;read;0;92;4;4000;1000;1;0;0;0.000000;0.000000;51;51;51.870000;0.000000;1.000000%=51;5.000000%=51;10.000000%=51;20.000000%=51;30.000000%=51;40.000000%=51;50.000000%=51;60.000000%=51;70.000000%=51;80.000000%=51;90.000000%=51;95.000000%=51;99.000000%=51;99.500000%=51;99.900000%=51;99.950000%=51;99.990000%=51;0%=0;0%=0;0%=0;58;58;58.375000;0.000000;0;0;0.000000%;0.000000;0.000000;0;0;0;0;0;0;0.000000;0.000000;0;0;0.000000;0.000000;1.000000%=0;5.000000%=0;10.000000%=0;20.000000%=0;30.000000%=0;40.000000%=0;50.000000%=0;60.000000%=0;70.000000%=0;80.000000%=0;90.000000%=0;95.000000%=0;99.000000%=0;99.500000%=0;99.900000%=0;99.950000%=0;99.990000%=0;0%=0;0%=0;0%=0;0;0;0.000000;0.000000;0;0;0.000000%;0.000000;0.000000;0.000000%;0.000000%;0;0;47;100.0%;0.0%;0.0%;0.0%;0.0%;0.0%;0.0%;0.00%;0.00%;0.00%;0.00%;0.00%;100.00%;0.00%;0.00%;0.00%;0.00%;0.00%;0.00%;0.00%;0.00%;0.00%;0.00%;0.00%;0.00%;0.00%;0.00%;0.00%;0.00%
So I get the bad header message as expected and can't reproduce the error you described. Can you offer any more details?
OK. I verified again, fio-3.16 doesn't work for me. I think the difference here is I'm writing to an unmounted drive and you are writing to a file.
Really shouldn't make any difference at all to fio, they are both flies as far as fio is concerned.
@donny372 I agree with Jens - I'm not going to get different results just because it's a file in a filesystem rather than a file representing a device. Just to reiterate: we still need some extra information from you:
/dev/hdm
is an unusual device node these days...Not sure why, but I got different result in my way fio-3.15(and before versions) vs fio-3.16. I also followed your steps and got error message with fio-3.16.
I found something interesting: The tool I'm using to corrupt data doesn't change the content before read, and fio-3.15 still can capture the error and after the read, the content updated.
/home/fio-3.15/fio_testdev --minimal --name=write --rw=write --filename=/dev/hdm --bs=4k --size=2m --offset=0x0000 --verify=md5 3;fio-3.15;write;0;0;2048;40156;10039;51;0;0;0.000000;0.000000;0;25526;57.821004;1128.933822;1.000000%=0;5.000000%=0;10.000000%=0;20.000000%=0;30.000000%=0;40.000000%=0;50.000000%=0;60.000000%=0;70.000000%=0;80.000000%=0;90.000000%=1;95.000000%=1;99.000000%=317;99.500000%=391;99.900000%=25559;99.950000%=25559;99.990000%=25559;0%=0;0%=0;0%=0;0;25526;57.846238;1128.933642;0;0;0.000000%;0.000000;0.000000;2048;227555;56888;9;0;0;0.000000;0.000000;1;130;4.265854;13.981036;1.000000%=1;5.000000%=1;10.000000%=1;20.000000%=2;30.000000%=2;40.000000%=2;50.000000%=2;60.000000%=2;70.000000%=2;80.000000%=2;90.000000%=3;95.000000%=4;99.000000%=93;99.500000%=123;99.900000%=130;99.950000%=130;99.990000%=130;0%=0;0%=0;0%=0;13;142;16.666918;14.122332;0;0;0.000000%;0.000000;0.000000;25.423729%;1.694915%;18;0;52;100.0%;0.0%;0.0%;0.0%;0.0%;0.0%;0.0%;56.45%;39.65%;1.46%;0.20%;0.20%;0.39%;0.78%;0.68%;0.10%;0.00%;0.00%;0.00%;0.00%;0.00%;0.10%;0.00%;0.00%;0.00%;0.00%;0.00%;0.00%;0.00%;sdg;0;0;0;0;0;0;0;0.00% hexdump -C /dev/hdm -n 20 00000000 ca ac 02 00 00 10 00 00 3c 54 09 c5 01 cc a4 00 |ʬ......<T.�.̤.| 00000010 00 00 00 00 |....| 00000014 ./tool-danger datacmd writesector 0x0000 /dev/hdm hexdump -C /dev/hdm -n 20 The data doesn't change before read 00000000 ca ac 02 00 00 10 00 00 3c 54 09 c5 01 cc a4 00 |ʬ......<T.�.̤.| 00000010 00 00 00 00 |....| 00000014 /home/fio-3.15/fio_testdev --minimal --name=read --rw=read --filename=/dev/hdm --bs=4k --size=2m --offset=0x0000 --verify=md5 --verify_pattern=0xff verify: bad magic header 0, wanted acca at file /dev/hdm offset 0, length 4096 (requested block: offset=0, length=4096) fio: pid=586091, err=84/file:third_party/fio/io_u.c:2029, func=io_u_sync_complete, error=Invalid or incomplete multibyte or wide character 3;fio-3.15;read;0;84;4;137;34;29;0;0;0.000000;0.000000;27834;27834;27834.800000;0.000000;1.000000%=27918;5.000000%=27918;10.000000%=27918;20.000000%=27918;30.000000%=27918;40.000000%=27918;50.000000%=27918;60.000000%=27918;70.000000%=27918;80.000000%=27918;90.000000%=27918;95.000000%=27918;99.000000%=27918;99.500000%=27918;99.900000%=27918;99.950000%=27918;99.990000%=27918;0%=0;0%=0;0%=0;27837;27837;27837.187000;0.000000;0;0;0.000000%;0.000000;0.000000;0;0;0;0;0;0;0.000000;0.000000;0;0;0.000000;0.000000;1.000000%=0;5.000000%=0;10.000000%=0;20.000000%=0;30.000000%=0;40.000000%=0;50.000000%=0;60.000000%=0;70.000000%=0;80.000000%=0;90.000000%=0;95.000000%=0;99.000000%=0;99.500000%=0;99.900000%=0;99.950000%=0;99.990000%=0;0%=0;0%=0;0%=0;0;0;0.000000;0.000000;0;0;0.000000%;0.000000;0.000000;0.000000%;0.000000%;1;0;85;100.0%;0.0%;0.0%;0.0%;0.0%;0.0%;0.0%;0.00%;0.00%;0.00%;0.00%;0.00%;0.00%;0.00%;0.00%;0.00%;0.00%;0.00%;0.00%;0.00%;0.00%;100.00%;0.00%;0.00%;0.00%;0.00%;0.00%;0.00%;0.00%;sdg;0;0;0;0;0;0;0;0.00% hexdump -C /dev/hdm -n 20 The data changed after read 00000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000010
Thanks @sitsofe and @axboe! I think it is something wrong with tool, closing this bug.
@donny372 something to note is that you're doing buffered I/O. If you are corrupting an LBA on the disk behind the kernel's back and if it (the kernel) has cached that LBA, it won't know it needs to reread it and will serve you data out of the cache. You may want to use direct=1
with fio to force bypass of the cache (the need to go direct would also apply to checking the region via a tool like dd too).
Tested with 3.15: md5: verify failed at file /dev/hdi offset 2097152, length 4096 (requested block: offset=2097152, length=4096) Expected CRC: e2c1cad45e5f670ee41147bb177ee79f Received CRC: dd3183fbaff8b9f71864e2bf678d6afa 3;fio-3.15;read;0;84;...
Tested with 3.16: 3;fio-3.16;read;0;0;...