facebook / rocksdb

A library that provides an embeddable, persistent key-value store for fast storage.
http://rocksdb.org
GNU General Public License v2.0
27.83k stars 6.2k forks source link

Decouple sync fault and write injection in FaultInjectionTestFS & fix tracing issue under WAL write error injection #12797

Closed hx235 closed 1 week ago

hx235 commented 1 week ago

Context/Summary:

After injecting write error to WAL, we started to see crash recovery verification failure in prefix recovery. That's because the current tracing implementation traces every write before it writes to WAL even when the WAL write can fail with write error injection. One consequence of that is the traced writes in trace files does not corresponding to write sequence sequence anymore e.g, it has more traced writes that the actual assigned sequence number to successful writes. Therefore https://github.com/facebook/rocksdb/blob/b4a84efb4e842b782e976de5b22a4554c2f76edd/db_stress_tool/expected_state.cc#L674 won't restore the ExpectedState to the correct sequence number we want.

Ideally, we should have a prepare-commit mechanism for tracing just like our ExpectedState so we can ignore the traced write if the write fails later. But for now, to simplify, we simply don't inject WAL error (and metadata write error cuz it could fail write when sync WAL dir fails)

To do so, we need to be able to exclude WAL from write injection but still allow sync fault injection in it to maintain its original sync fault testing coverage. This prompts us to decouple sync fault and write injection in FaultInjectionTestFS. And this is what this PR mainly about.

So now FaultInjectionTestFS works as the following:

Bonus: better naming of relevant variables

Test:


- CI
facebook-github-bot commented 1 week ago

@hx235 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

hx235 commented 1 week ago

Existing CI failures

ajkr commented 1 week ago

Existing CI failures

Should be fixed - try rebasing (can be done by clicking "Update branch")

facebook-github-bot commented 1 week ago

@hx235 has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot commented 1 week ago

@hx235 has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot commented 1 week ago

@hx235 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot commented 1 week ago

@hx235 merged this pull request in facebook/rocksdb@0d93c8a6cae309b5708641a8694ee0963014d633.