apple / foundationdb

FoundationDB - the open source, distributed, transactional key-value store
https://apple.github.io/foundationdb/
Apache License 2.0
14.38k stars 1.3k forks source link

Injected disk error in AsyncFileChaos is not tracked and causing StorageServerDurabilityError #9747

Open jzhou77 opened 1 year ago

jzhou77 commented 1 year ago

The fault injection is here https://github.com/apple/foundationdb/blob/27e1b76b68990422418ab7195f35494286117e71/fdbrpc/include/fdbrpc/AsyncFileChaos.h#L99-L105

This error can later cause storage server SevError of StorageServerDurabilityError here https://github.com/apple/foundationdb/blob/27e1b76b68990422418ab7195f35494286117e71/fdbserver/storageserver.actor.cpp#L10408

In 7.2 cherrypicks https://github.com/apple/foundationdb/pull/9732, commit 79d0687e5, seed -f ./tests/slow/DiskFailureCycle.toml -s 282036857 -b off. I found that the corruption happened for one of the disk queue file, but the page was copied to another file and was discarded after the reboot, which makes it hard to track the dirty page.

120.205235 CorruptedByteInjection ID=0000000000000000 Filename=/root/simfdb/fa11e779fcbb57e0bfa2a1dcb38615c4/storage-fb3b1d127a41c6afa254b6d5dc693cfb-1.fdq Position=293707 BK,CC,CP,SS

126.428277 DQRecInvalidPage ID=fb3b1d127a41c6af NextReadLocation=1093668 HashCheck=0 Seq=1093632 Expect=1093632 File0Name=/root/simfdb/fa11e779fcbb57e0bfa2a1dcb38615c4/storage-fb3b1d127a41c6afa254b6d5dc693cfb-0.fdq SS
126.428277 DQTruncateFile ID=fb3b1d127a41c6af File=1 Pos=290816 File0Name=/root/simfdb/fa11e779fcbb57e0bfa2a1dcb38615c4/storage-fb3b1d127a41c6afa254b6d5dc693cfb-0.fdq SS
126.428477 FindPhysicalLocation ID=fb3b1d127a41c6af Page0Valid=1 Page0Seq=425984 Page1Valid=1 Page1Seq=802816 Location=946986 Context=lastPoppedSeq File0Name=/root/simfdb/fa11e779fcbb57e0bfa2a1dcb38615c4/storage-fb3b1d127a41c6afa254b6d5dc693cfb-0.fdq SS
126.428477 FoundPhysicalLocation ID=fb3b1d127a41c6af PageIndex=1 PageLocation=35 SizeofPage=4096 PageSequence=802816 Location=946986 Context=lastPoppedSeq File0Name=/root/simfdb/fa11e779fcbb57e0bfa2a1dcb38615c4/storage-fb3b1d127a41c6afa254b6d5dc693cfb-0.fdq SS
126.428477 DQTruncateFile ID=fb3b1d127a41c6af File=0 Pos=0 File0Name=/root/simfdb/fa11e779fcbb57e0bfa2a1dcb38615c4/storage-fb3b1d127a41c6afa254b6d5dc693cfb-0.fdq SS

126.428877 StorageServerDurabilityError ID=fb3b1d127a41c6af RestoredVersion=302130779 Checking=min MinVersion=303134785 MaxVersion=303722114 Backtrace=addr2line -e fdbserver.debug -p -C -f -i 0x47a9867 0x47a9b15 0x47a44a4 0x46c42f5 0x46c455a 0x2c9876c 0x2c97e19 0x2c97750 0x2c16c2e 0x2cc4fa6 0x2cc3cd7 0x22727c8 0x21872f8 0x21871a6 0x2187079 0x2187aae 0x2176c38 0x46b0888 0x46b0594 0x288ce38 0x17278a8 0x4696e03 0x469691a 0x2b7e6bc 0x7f0b6fe49555 SS
muzala commented 1 year ago

this was good i see the way it works