Closed hx235 closed 1 week ago
@hx235 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.
Existing CI failures
Existing CI failures
Should be fixed - try rebasing (can be done by clicking "Update branch")
@hx235 has updated the pull request. You must reimport the pull request before landing.
@hx235 has updated the pull request. You must reimport the pull request before landing.
@hx235 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.
@hx235 merged this pull request in facebook/rocksdb@0d93c8a6cae309b5708641a8694ee0963014d633.
Context/Summary:
After injecting write error to WAL, we started to see crash recovery verification failure in prefix recovery. That's because the current tracing implementation traces every write before it writes to WAL even when the WAL write can fail with write error injection. One consequence of that is the traced writes in trace files does not corresponding to write sequence sequence anymore e.g, it has more traced writes that the actual assigned sequence number to successful writes. Therefore https://github.com/facebook/rocksdb/blob/b4a84efb4e842b782e976de5b22a4554c2f76edd/db_stress_tool/expected_state.cc#L674 won't restore the ExpectedState to the correct sequence number we want.
Ideally, we should have a prepare-commit mechanism for tracing just like our ExpectedState so we can ignore the traced write if the write fails later. But for now, to simplify, we simply don't inject WAL error (and metadata write error cuz it could fail write when sync WAL dir fails)
To do so, we need to be able to exclude WAL from write injection but still allow sync fault injection in it to maintain its original sync fault testing coverage. This prompts us to decouple sync fault and write injection in FaultInjectionTestFS. And this is what this PR mainly about.
So now
FaultInjectionTestFS
works as the following:FaultInjectionTestFS
is bypassed for writable fileBonus: better naming of relevant variables
Test: