Closed pdillinger closed 4 months ago
Grr, unresolved issues showing up in crash test.
@pdillinger has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.
Passed 3 hours of blackbox_crash_test with amplified checkpoint and backup, so should be good for review
Why don't we sync these WAL files and then hard link them?
Hmm, you're probably right that I could have done SyncClosedWals before the final GetSortedWalFiles, but I believe the approach in this PR is a step closer to the goal of not relying on filesystem queries for WAL info in GetLiveFilesStorageInfo.
Also, GetLiveFilesStorageInfo() can be used aside from Checkpoint and Backup, e.g. for statistical purposes. Under that broader set of purposes, it should be minimally blocking when asked not to flush memtable.
@pdillinger merged this pull request in facebook/rocksdb@98393f0139fc9529d5c56a4b43bc7a245b22f734.
Summary: Background: there is one active WAL file but there can be several more WAL files in various states. Those other WALs are always in a "flushed" state but could be on the
logs_
list not yet fully synced. We currently allow any WAL that is not the active WAL to be hard-linked when creating a Checkpoint, as although it might still be open for write, we are not appending any more data to it.The problem is that a created Checkpoint is supposed to be fully synced on return of that function, and a hard-linked WAL in the state described above might not be fully synced. (Through some prudence in #10083, it would synced if using track_and_verify_wals_in_manifest=true.)
The fix is a step toward a long term goal of removing the need to query the filesystem to determine WAL files and their state. (I consider it dubious any time we independently read from or query metadata from a file we have open for writing, as this makes us more susceptible to FileSystem deficiencies or races.) More specifically:
trim_to_size=true
fromGetLiveFilesStorageInfo()
. And while we're at it, use our known flushed sizes for those WALs.Stacked on #12729
Test Plan: Expanded an existing test, which now fails before fix. Also long runs of blackbox_crash_test with amplified checkpoint frequency.