Open phyorat opened 1 year ago
Hi @phyorat,
thank you for posting the issue. Do you happen to still have the vbdench config you used for your test?
I came up with a fio config to mimic the vbdench's behaviour:
[dc_repro]
filename=/dev/cas1-1
ioengine=libaio
iodepth=1
direct=1
numjobs=1
# Generate new offset for every second write
rw=randwrite:2
rw_sequencer=identical
bssplit=64k/50:256k/50
# This ensures that every 64K write will be followed by 256K write
number_ios=2
loops=10000
verify=md5
# Verify after every write
verify_backlog=1
# Stop FIO if DC
verify_fatal=1
Description
IO-class can passthrough big IO, like bigger than 128KB, to skip cache and writing to HDD directly; this can get higher performance and cache efficience. IO-class configure file example:
IO class id,IO class name,Eviction priority,Allocation 0,unclassified,22,0 1,request_size:le:131072,1,1
After load this IO-class, 128K IO data will write into HDD directly; and also if no cached data in cache, no data will read from cache but from HDD.
But on the other hand, if part of requested data, for example, head-64K data of 128K is cached; then read should taking 64K from cache and the left 64K from HDD. 64K cached <> 64K from HDD |-------------------------------------|--------------------------------------|
This is expected data "splicing". The Acctual IO pathern is: 1, 64K write IO writing into cache, the first time with key = 1; 2, Read 64K and verified OK; 3, 64K write IO (with key = 2) merged two as one, writing 128KB into HDD directly; 4, Read 64K but verified failed (got key = 1, which indicating old data) (We guess miss-read old 64K from cache)
step 3-4 may also be: 3, 64K write IO (with key = 2); plus two IO-write; 4, Two-read 64K IO merge as one, reading 128K from HDD directly(because of IO-class rule [1]); 5, Verified failed (got key = 1, which indicating old data) (We guess miss-read old 128K data directly from HDD)
We verified data validation on this scenario, with vdbench; and found data validation error occured.
21:05:24.364 hd2-0: dvpost: /dev/vdb sd4 sd4 0x00000000 0x234520000 131072 0x0 0x5ecf4d1ed319c 0x11 0x2 0x70 0x0 0 36028797018963971 21:05:24.364 hd2-0: 21:05:24.364 hd2-0: Data Validation error for sd=sd4,lun=/dev/vdb 21:05:24.364 hd2-0: Block lba: 0x234520000; sector lba: 0x234520000; xfersize: 131072; relative sector in block: 0x00 ( 0) 21:05:24.364 hd2-0: ===> Data Validation Key miscompare. 21:05:24.364 hd2-0: ===> Data miscompare. 21:05:24.364 hd2-0: The sector below was written Tuesday, November 8, 2022 20:38:41.711 CST 21:05:24.364 hd2-0: 0x000 00000002 34520000 ........ ........ 00000002 34520000 0005ecf4 d1ed319c 21:05:24.364 hd2-0: 0x010 02..0000 73643420 20202020 00000000 01700000 20346473 20202020 00000000 21:05:24.364 hd2-0: Key miscompare always implies Data miscompare. Remainder of data suppressed.
This error shows that, tool wrote data-key "02xxxx" but read data from core is "01xxxx". The key point is, rgiht after error occured, we read data from core-dev direactly, the data is correct - "02xxxx". So there should be data align/validation issue between cache and HDD, in a very tiny time interval (serval miliseconds)?
After cancel/clear this IO-class, and test again, no data validation error occured any more.
In addition, this configure <Sequential cutoff policy: always; --threshold 128KB> can also trigger data validation error.
Expected Behavior
No data align/validation issue between cache and HDD, when setting IO-class for skiping big IO written into cache.
Actual Behavior
There should be data align/validation issue between cache and HDD, if IO-read data partially cached.
Steps to Reproduce
Context
Base block storage for distributed block-system; need to guarantee that data validatoin is OK.
Possible Fix
Maybe meta-data are not strongly aligned or expired between different IO stage (in miliseconds).
Logs
No evidence until now; but reverse verification(remove that IO-class) can be a clue.
Your Environment