facebook / rocksdb

A library that provides an embeddable, persistent key-value store for fast storage.
http://rocksdb.org
GNU General Public License v2.0
28.38k stars 6.29k forks source link

Fix a failure to propagate ReadOptions #12757

Closed pdillinger closed 3 months ago

pdillinger commented 3 months ago

Summary: The crash test revealed a case in which the uncache functionality in ~BlockBasedTableReader could initiate an block read (IO), despite setting ReadOptions::read_tier = kBlockCacheTier.

The root cause is a place in the code where many people have over time decided to opt-in propagating ReadOptions and no one took the initiative to propagate ReadOptions by default (opt out / override only as needed). The fix is in partitioned_index_reader.cc. Here, ReadOptions::readahead_size is opted-out to avoid churn in prefetch_test that is not clearly an improvement or regression. It's hard to tell given the poor state of relevant documentation #12756. The affected unit test was added in #10602.

Test Plan: (Now postponed to a follow-up diff) I have added some new infrastructure to DEBUG builds to catch this specific kind of violation in unit tests and in the stress/crash test. EnforceReadOpts establishes a thread-local context under which we assert no IOs are performed if ReadOptions said it should be forbidden. With this new checking, the Uncache unit test would catch the critical step toward a violation (inner ReadOptions allowing IO, even if no IO is actually performed), which is fixed with the production code change.

facebook-github-bot commented 3 months ago

@pdillinger has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot commented 3 months ago

@pdillinger has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot commented 3 months ago

@pdillinger has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

hx235 commented 3 months ago

I'm good with the partitioned_index_reader fix but want to discuss more on the EnforceReadOpts and thread-local context. So feel free to separate this into two PRs to unblock stress test.

For EnforceReadOpts::UsedIO(), my understanding is we call this to verify we are allowed to do IO before we do IO. Can we move this check to lower-level like file reader or file system level?

For EnforceReadOpts enforce(ropts), do we envision adding more of this in BlockBasedTableReader layer?

facebook-github-bot commented 3 months ago

@pdillinger has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot commented 3 months ago

@pdillinger has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

pdillinger commented 3 months ago

OK, PR is split (next one with EnforceReadOpts to be created)

facebook-github-bot commented 3 months ago

@pdillinger merged this pull request in facebook/rocksdb@d64eac28d32a025770cba641ea04e697f475cdd6.