facebook / rocksdb

A library that provides an embeddable, persistent key-value store for fast storage.
http://rocksdb.org
GNU General Public License v2.0
27.85k stars 6.2k forks source link

Disallow manual prefix iteration in MultiCfIterators #12770

Closed jaykorean closed 3 weeks ago

jaykorean commented 3 weeks ago

Summary

As described in https://github.com/facebook/rocksdb/wiki/Prefix-Seek#manual-prefix-iterating, a DBIter can land on an unexpected key if it iterates outside of the iterator range. MultiCfIterator uses DBIters as child iterators, and the implementation assumes that these child iterators are at valid keys at all times.

After a recent change in #12706 , which added the MultiCfIterator to the stress test , we encountered a failure where the CoalescingIterator failed to switch direction when a child iterator was at a key that had previously been deleted. This issue was due to the undefined result from the manual prefix iteration of the child iterator.

This PR adds an optional param bool disallow_manual_prefix_iteration = false to the DB::NewIterators() API so that Status::InvalidArgument() can be returned when MultiCfIterator creation tries to get child iterators when total order seek is not guaranteed.

In case anyone is wondering why this has not been a problem in the regular DBIter stress test, VerifyIterator() skips checking when the iterator prefix seek is out of the range

https://github.com/facebook/rocksdb/blob/13c758f9869c2b086e9c1f7457c15db93349f258/db_stress_tool/db_stress_test_base.cc#L1700-L1702

Test Plan

Unit Test added

./multi_cf_iterator_test

MultiCfIterator added back to the stress test

python3 tools/db_crashtest.py blackbox --simple --max_key=25000000 --write_buffer_size=4194304 --use_attribute_group=0 --use_put_entity_one_in=1 --use_multi_get=1 --use_multi_cf_iterator=1 --verify_iterator_with_expected_state_one_in=2
facebook-github-bot commented 3 weeks ago

@jaykorean has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

cbi42 commented 3 weeks ago

What if we stop TestIterate() once a key is outside prefix range? I.e. we don't operate on iter once it's result is undefined. It feels we can still support and test MultiCfIterator with manual prefix iteration.

facebook-github-bot commented 3 weeks ago

@jaykorean has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot commented 3 weeks ago

@jaykorean has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

jaykorean commented 3 weeks ago

@cbi42 For now we will just not support the feature when the user attempts to create the MultiCfIterator with manual prefix iteration. DBImpl::NewMultiCfIterator() will return an ErrorIterator withStatus::InvalidArgument()

(Discussed offline, leaving comment for the future reference)

facebook-github-bot commented 3 weeks ago

@jaykorean has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot commented 3 weeks ago

@jaykorean has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot commented 3 weeks ago

@jaykorean has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot commented 3 weeks ago

@jaykorean has updated the pull request. You must reimport the pull request before landing.

jaykorean commented 3 weeks ago

After offline discussion with @cbi42 , we decided to allow MultiCfIter to go on (possibly giving the same "wrong/undefined" result as the child iters) if any of the child iter is iterating outside of range due to manual prefix iteration

I will put up a brand new PR for it.