facebook / rocksdb

A library that provides an embeddable, persistent key-value store for fast storage.
http://rocksdb.org
GNU General Public License v2.0
28.09k stars 6.25k forks source link

Revamp stress test for LockWAL(), tweak contract #12666

Closed pdillinger closed 2 months ago

pdillinger commented 3 months ago

Summary: Upon discovering that LockWAL() does not block file ingestion, I looked into whether its behavior should be expanded to block that or whether it should block as little as reasonably possible to freeze the WAL state.

Although the idea of freezing more things that resemble writes was broadly attractive, I found in digging through the code that that would likely expose us to more risks of freezing too much, leading to deadlock. Stopped writes is a good, mature mechanism to reasonably over-approximate what is needed to freeze the WAL state.

LockWAL might not even be a useful feature. The MySQL team suspects that FlushWAL would be sufficient for their needs, because writes should already be blocked on their side, but we aren't yet sure we can deprecate the LockWAL feature.

So the best course of action now, in this commit and IMHO, is to

This does not change any LockWAL functionality (and I think we should avoid changing the implementation in the same minor release if we're changing the contract in a minor release).

Follow-up to https://github.com/facebook/rocksdb/pull/12642

Test Plan: The modifications to how we validate LockWAL() in the stress test mean we can re-enable file ingestion in combination with LockWAL(); HOWEVER, the new validation is bypassed when FaultInjectionTestFS is used because of at least two issues, which are detailed in a FIXME comment in db_stress_test_base.cc where the validation code lives.

I've run blackbox_crash_test for quite a while with certain relevant amplifications to ensure it remains stable.

facebook-github-bot commented 3 months ago

@pdillinger has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

pdillinger commented 2 months ago

going with https://github.com/facebook/rocksdb/pull/12652 instead