facebook / rocksdb

A library that provides an embeddable, persistent key-value store for fast storage.
http://rocksdb.org
GNU General Public License v2.0
28.45k stars 6.3k forks source link

delete_range and memtable prefix bloom filter bug #2743

Closed zhangjinpeng87 closed 7 years ago

zhangjinpeng87 commented 7 years ago

When enable memtable prefix bloom filter, at the same time use delete_range may see keys have deleted by delete_range. For example, when I put keys (a111, b111, c111, let we use FixedLength(1) as prefix extractor) and then flush them into sst file. After that, I delete_range(a111, c111), when I db.Get(b111), it will return the value in sst file. Because here did't consider delete_range when prefix_bloom is not null. https://github.com/facebook/rocksdb/blob/master/db/memtable.cc#L653

zhangjinpeng87 commented 7 years ago

@ajkr

siddontang commented 7 years ago

Another serious problem when enable memtable prefix bloom filter.

After we ingest a SST, may k1 -> v1, k2 -> v2, ... kn-> vn, and delete k2. Sadly, seeking k2 can get k2.

zhangjinpeng87 commented 7 years ago

''' Another serious problem when enable memtable prefix bloom filter.

After we ingest a SST, may k1 -> v1, k2 -> v2, ... kn-> vn, and delete k2. Sadly, seeking k2 can get k2. ''' @ajkr this pr don't fix the problem mentioned above.

zhangjinpeng87 commented 7 years ago

test_prefix_bloom_delete_after_ingest in this file reproduce the exact situation described above.

siddontang commented 7 years ago

The second problem is our fault, we don't use total seek mode :sob:

Please ignore it. But it is still strange that we can seek the deleted key without total order seek is true.

ajkr commented 7 years ago

Agreed. I updated the wiki (https://github.com/facebook/rocksdb/wiki/Prefix-Seek-API-Changes) slightly:

Before: "... we might return Valid()=false, or any key in the DB that is larger than the previous key"

After: "... we might return Valid()=false, or any key that is larger than the previous key"

In optimized prefix seek mode (when total_order_seek=false and memtable prefix bloom enabled), we use a special memtable iterator that only sees keys in the seeked key's prefix. So the deletion outside that prefix isn't found.