Closed dongdongwcpp closed 2 years ago
@dongdongwcpp Thanks for the repro code!!! Seems like a flush is also needed to trigger this assertion - either by putting s = db->Put(wop, "key11" + std::to_string(i), value);
in a big loop on i
or doing something like db->Flush(FlushOptions())
after any number of Put
(even Put
once can trigger the assertion).
I also verified removing any of the partition_filters + index_type
, cache_index_and_filter_blocks
, cache_index_and_filter_blocks
won't trigger the assertion. So I suspect either one of them might not be compatible with others.
===== Below are some debugging/impl details for my teammates - feel free to ignore it @dongdongwcpp :) =====
@akankshamahajan15 Is the feature "block cache pre-population" now supporting partition filter? Adding table_options.partition_filters = true; table_options.index_type = rocksdb::BlockBasedTableOptions::IndexType::kTwoLevelIndexSearch;
to TEST_F(DBBlockCacheTest, WarmCacheWithBlocksDuringFlush)
on 6.24.fb branch exposes the same error and a slightly different error for the latest upstream/main. I believe both errors are related to rocksdb::PartitionedFilterBlockReader::CacheDependencies(rocksdb::ReadOptions const&, bool)
.
I did some initial debugging on 6.24.fb branch:
The assertion was triggered by "the offset of the current entry in the partitioned filter block" (i.e, current_
) being equal to "the offset of the related restart array" (i.e, restarts_
), which violated the assertion (current_ < restarts_
) mentioned in the error.
This equality was a result of a previous invalidation of the IndexBlockIterator
of the partitioned filter block during the iterator's construction. The invalidation happened due to the partitioned filter block had size_ = 0
(i.e, the size of the block content data) by the time of the iterator's construction. And the invalidation set current_ = restarts_
as part of the invalidation procedure.
The size_ = 0
in the partitioned filter block was caused by a previous instruction before constructing the index block iterator above. The instruction is GetOrReadFilterBlock()
in PartitionedFilterBlockReader::CacheDependencies()
. It didn't make the filter block's size_ > 0
while returning Status::OK()
by falling into this branch.
Status::OK()
from GetOrReadFilterBlock()
solely depending on the information from the CacheEntry
wrapper of the filter block (i.e, filter_block_.IsEmpty()
) without checking the block's size_
variable. The ok status sort of indicated "everything is fine to proceed", while later on, this variable size_
caused the trouble above for us. Not sure if this is the correct behavior we want ...Another thing I haven't figured out is how or whether all these might relate to the incompatibility of the three options mentioned above as apparently turning off either one of them works fine. As my debugger shows, removing either of them leads to above size_
in the partitioned filter block being greater than 0, thus not triggering the assertion. Cc our filter expert @pdillinger here just to see if there is any luck.
(cc @ltamasi FYI)
Thanks for the deep dive @hx235 ! Based on your analysis above, the offending block is definitely the top-level index of the partitioned filters, for two reasons: 1) that's the only "index-like" block when it comes to partitioned filters; you can't iterate over a filter partition and 2) that's the block that is handled by GetOrReadFilterBlock
when partitioned filters are in use.
BTW, this branch in GetOrReadFilterBlock
you mentioned kicks in when the partitioned filter reader has direct access to the top-level index, i.e. when a) cache_index_and_filter_blocks
is false
, or b) cache_index_and_filter_blocks
is true
and the the top-level index is pinned in the cache. If filter_block_
here refers to a zero-sized block, that shouldn't happen and probably points to an issue with the preloading code.
Based on my cursory reading of the cache pre-populating logic, the issue seems to be in BlockBasedTableBuilder::InsertBlockInCacheHelper
: namely, I think we're missing a special case for the top-level index of partitioned filters (which is conceptually a quote-unquote filter block but format-wise is a regular key-value Block
).
Thanks @hx235 and @ltamasi for the detailed explanation. I will look into it.
This surely would have been caught if prepopulate_block_cache was added to db_stress test. (Reviewers should have noticed that.)
This surely would have been caught if prepopulate_block_cache was added to db_stress test. (Reviewers should have noticed that.)
Yes. After fixing it, I will add it to the db_stress as well and will keep this in mind for the future implementations as well.
Expected behavior
rocksdb write ok
Actual behavior
in rocksdb 6.24 buildType: release: IO error: While pread offset 18446744073709547520 len 8192: /tmp/rocksdb_simple_example/000227.sst: Invalid argument in rocksdb 6.24 buildType:Debug: rocksdb/table/block_based/block.h:629: virtual rocksdb::IndexValue rocksdb::IndexBlockIter::value() const: Assertion `Valid()' failed.
Steps to reproduce the behavior
init rocksdb like this,the five table_options 's combination will reproduce the failure.