Snapchat / KeyDB

A Multithreaded Fork of Redis
https://keydb.dev
BSD 3-Clause "New" or "Revised" License
11.54k stars 579 forks source link

[BUG] scan doesn't work when using only flash storage #648

Open fvigotti opened 1 year ago

fvigotti commented 1 year ago

Describe the bug scan doesn't work when using only flash storage

To reproduce ( single instance, no cluster , docker.io/eqalpha/keydb:latest ( as of today ) )

save "" appendonly no

storage-provider flash /opt/dbstorage/flash storage-provider-options "use_direct_reads=true;allow_mmap_reads=false;use_direct_writes=true;allow_mmap_writes=false" ( also storage-provider-options are not documented but I've found those on the web )

enter some keys ( 10000+ ) in the database , restart the db or enter more keys that exceed the maxmemory

Expected behavior keydb-cli scan command once interated should return the iterator and should allow to iterate all keys, with ~3gb db ( 10k keys) KEYS # return all keys albeit being superslow ( reading from disk ) keydb-cli scan # iterated return only 114 keys ( I've tried everything, from count 10000 , to count 10 and iterating pointers manually ) even after KEYS command the scan doesn't work ( return always the same amount of keys )

msotheeswaran-sc commented 1 year ago

Currently KeyDB does not implement scan with FLASH, but we will add it to our list of features to add before official FLASH release.

keithchew commented 1 year ago

Hi

I can also confirm this affects modules that call RedisModule_Scan. Using flash with a dataset larger than memory, only the records within memory will be scanned.

MrBlaise commented 7 months ago

Hey!

I am facing this issue right now. I have a housekeeping service that regularly checks all the keys in redis against a metadata server which relies on SCAN. Anyone has any recommendation how to scan all they keys when using FLASH mode accurately?

keithchew commented 7 months ago

hi @MrBlaise, you will need to use the KEYS command in the meantime...

keithchew commented 1 month ago

Hi

The dataset I am working on has grown quite a bit and using the workaround KEYS above is starting to block and slow things down. I had some time last weekend to review the code to see if we can add SCAN for flash. I cannot see an easy way to do this, as redis seems to calculating the cursor (based on in memory hashtable) using their custom algorithm, and rocksdb does not have a matching concept.

I ended up hacking together something, by modifying the KEYS command to accept a string based cursor, which is passed directly to rocksdb's Seek command. In addition, if the cursor is specified, the next cursor will also be returned as the last element of the KEYS reply.

This has allowed me to use a cursor with this KEYS command and it works quite well, like the SCAN. If anyone else if thinking of implementing a solution for this, and would like to share some thoughts, please let me know.