Open alim-akbashev opened 1 month ago
Hello @alim-akbashev and thanks for reaching our community,
I double check code and I didn't find any misbehavior. I also wrote small test to verify your concerns.
The roptions.read_tier = kBlockCacheTier;
is not problem. As you can see, on line above we do a copy of ReadOptions
. Even the read options passed to the function are marked as cons (const ReadOptions& read_options
), so it's impossible to change them.(Unless you do some dark C++ magic).
keyMayExist - it's littble bit confusing, let explain in table : |
Return value | Key in database |
---|---|---|
true |
Yes or No | |
false |
Not |
Originally this function used BloomFilter, but today it may be replaced with RibbonFilter or for small databases it may use cache.
Radek
hi @rhubner , thanks for your reply
Yea, seems my first assumption on the reason of the behaviour was wrong.
But i still have concerns regarding the current behaviour.
keyMayExist
- it's littble bit confusing, let explain in table :Return value Key in database
true
Yes or Nofalse
Not
If it would be so, that would not be an issue for me. But unfortunately it does not behave so.
I double check code and I didn't find any misbehavior. I also wrote small test to verify your concerns.
I believe, the test will fail if we add re-opening of a db between put and keyExists/keyMayExist (to clear all kind of possible caches). At least exactly such behaviour i do see using a java code - keyExists/keyMayExist returns false
when the key does exist in database. But if I call get(key) first, keyExists/keyMayExist starts returning a correct result true
for this key - my guess is that happens because the data by key (or just an index block?) is in a block cache at that moment.
Hi @alim-akbashev ,
believe, the test will fail if we add re-opening of a db between put and keyExists/keyMayExist
Feel free to update the test. Maybe I missed something. I also try to compact DB or flush WAL, but it didn't have any effect on the test.
Radek
yea, i'm so sorry, @rhubner , my bad i didn't double-check the steps-to-reproduce.
it turned out, that to reproduce 1. it's required to fill a db with much more data 2. it's required to restart the app, just closing and opening the db again does not shows the issue. i'm still not sure why it happens and why these conditions are required, this time i won't make any assumptions :)
anyways, i've created a demo projects which 100% reproduces the issue (at least in my environment) https://github.com/alim-akbashev/keyExistsBugDemo
it consist of two mini-apps:
hope it helps in investigation.
Hello @alim-akbashev ,
thanks for the example. I can replicate your behavior on my machine.
I think it's the key_maye_exist what return wrong result. But this happen only for TtlDB. For normal RocksDB I'm getting correct results. Maybe different implementations? Let me investigate more.
Radek
Hello @alim-akbashev ,
I did some debbuging over weeked and today and move little bit further. I will be talking primary about DBWithTTLImpl::KeyMayExist
The first call db_->KeyMayExist
return true
for key which is in DB, but then the second if statement fail. See bellow the output of two calls :
ret : 1
value_found : 1
value.size : 0
SanityCheckTimestamp : Corruption: Error: value's length less than timestamp's
StripTS(value) : Corruption: Bad timestamp in key-value
What's strange, that value_found
is 1(true
) but there are no data in value
.
I still don't know why this is happening, but I will continue investigating.
Radek
hello @rhubner , thanks for your efforts.
here's a quick update from my side - I've updated the demo app, now it contains only settings which really affect keyExists/keyMayExist behaviour + now it is a single app: https://github.com/alim-akbashev/keyExistsBugDemo/blob/main/src/main/java/akbashev/rocksdb/MinimalStepsToReproduceApp.java :
.setCacheIndexAndFilterBlocks(true)
So now I more or less certain that the behaviour depends on the index and if a db is configured to place indices in a block cache instead of heap (am I right that by default a bloom filter and an index are at heap and they are preloaded during opening a db?), keyMayExist does not trigger loading the corresponding index to a cache as get()
does. Probably it's understandable since keyMayExists intended to be a lightweight alternative for get() with zero IO. If so, maybe it's rather an undocumented specific limitation than a bug...
Thanks, Alim
Hello @alim-akbashev ,
I think I found it. In DBImpl::KeyMayExist we always set *value_found = true;
.
Then at the end, you can see that the method return true
, even when it's status is IsIncomplete
.
I think the method should have something like this at the end :
if(value_found != nullptr && s.IsIncomplete()) {
*value_found = false;
}
or line *value_found = true;
should be removed.
@pdillinger What do you think about it ? Unfortunately I'm not very familiar with this part of RocksDB codebase. After the proposed change, ttl_test
and bloom_test
can run without problem.
Radek
cc: @adamretter
The method keyExists() returns correct result only when data exist in a block cache. It seems that
key_exists_helper
atrocksjni.cc
has a bug due to a side-effect of c++ impl ofKeyMayExist
which changes the state ofread_options
at https://github.com/facebook/rocksdb/blob/b26b395e0a15255d322be08110db551976188745/db/db_impl/db_impl.cc#L3765So consequent call of db->Get(read_opts, ...) looks up only through a block cache tier too:
UPDATE: probably my assumption on the root cause is wrong. It seems that
keyMayExist()
just does not work as expected (and advertised in docs):Actually it returns false negative all the time :( it only returns positive result if the data by key is in a data block cache.
Expected behavior
keyExists()
should not produce false negative casesActual behavior
keyExists()
returns false-negative result if data by a requested key is not in a block cache.Steps to reproduce the behavior