facebook / rocksdb

A library that provides an embeddable, persistent key-value store for fast storage.
http://rocksdb.org
GNU General Public License v2.0
28.33k stars 6.29k forks source link

delayed data fetching in RocksDB ( Java ) as queue #5972

Open Aravindhan1995 opened 4 years ago

Aravindhan1995 commented 4 years ago

I am using RocksDB as the queue for my application. The logic is based on the incremental numbers as keys in a single column family. The insert rate is 5000 inserts per minute. The data is fetched from RocksDB using RocksIterator. There is delete after every data processing. The setup is put under load test, Initially, the fetching speed was very high ( 1 ms ). After 10 days the inserts are stopped as already large junk of data (12G is accumulated in RocksDB ) and again the fetching speed is good for a week and after that, the data fetching time increases ( 700 ms). I know the reason for the increase is due to continuous deletes. Is there any way to optimize RocksDB for queuing as it involves continuous inserts, reads and deletes.

Options file

RocksDB option file.

[Version] rocksdb_version=6.0.1 options_file_version=1.1

[DBOptions] preserve_deletes=false allow_ingest_behind=false dump_malloc_stats=false info_log_level=INFO_LEVEL write_thread_max_yield_usec=100 avoid_flush_during_shutdown=false enable_write_thread_adaptive_yield=true wal_recovery_mode=kPointInTimeRecovery fail_if_options_file_error=false stats_history_buffer_size=1048576 stats_persist_period_sec=600 delete_obsolete_files_period_micros=21600000000 bytes_per_sync=0 enable_pipelined_write=false max_subcompactions=1 db_log_dir= wal_dir=/home/test/Queue-Scalability/Jun-29/PersistentCache/RocksDB/data max_log_file_size=0 manifest_preallocation_size=4194304 delayed_write_rate=16777216 log_file_time_to_roll=0 avoid_flush_during_recovery=false write_thread_slow_yield_usec=3 keep_log_file_num=5 table_cache_numshardbits=6 max_file_opening_threads=16 max_background_flushes=-1 base_background_compactions=-1 max_background_compactions=-1 use_fsync=false use_adaptive_mutex=false wal_bytes_per_sync=0 random_access_max_buffer_size=1048576 atomic_flush=false compaction_readahead_size=0 manual_wal_flush=false new_table_reader_for_compaction_inputs=false max_total_wal_size=0 skip_stats_update_on_db_open=false skip_log_error_on_recovery=false max_manifest_file_size=1073741824 paranoid_checks=true stats_dump_period_sec=600 recycle_log_file_num=0 is_fd_close_on_exec=true error_if_exists=false enable_thread_tracking=false create_missing_column_families=false WAL_ttl_seconds=0 create_if_missing=true access_hint_on_compaction_start=NORMAL max_background_jobs=2 allow_2pc=false use_direct_io_for_flush_and_compaction=false db_write_buffer_size=104857600 two_write_queues=false use_direct_reads=false allow_concurrent_memtable_write=true allow_mmap_writes=false writable_file_max_buffer_size=1048576 WAL_size_limit_MB=0 allow_fallocate=true max_open_files=100 allow_mmap_reads=false advise_random_on_open=true

    [CFOptions "M2MDataProcessorKeyVsDeviceIdentifier"]

compaction_pri=kMinOverlappingRatio merge_operator=nullptr compaction_filter_factory=nullptr memtable_factory=SkipListFactory memtable_insert_with_hint_prefix_extractor=nullptr comparator=leveldb.BytewiseComparator target_file_size_base=67108864 max_sequential_skip_in_iterations=8 compaction_style=kCompactionStyleLevel max_bytes_for_level_base=268435456 bloom_locality=0 write_buffer_size=67108864 compression_per_level= memtable_huge_page_size=0 max_successive_merges=0 arena_block_size=8388608 memtable_whole_key_filtering=false target_file_size_multiplier=1 max_bytes_for_level_multiplier_additional=1:1:1:1:1:1:1 num_levels=7 min_write_buffer_number_to_merge=1 max_write_buffer_number_to_maintain=0 max_write_buffer_number=2 compression=kSnappyCompression level0_stop_writes_trigger=36 level0_slowdown_writes_trigger=20 compaction_filter=nullptr level0_file_num_compaction_trigger=1 max_compaction_bytes=1677721600 compaction_options_universal={allow_trivial_move=false;stop_style=kCompactionStopStyleTotalSize;compression_size_percent=-1;max_size_amplification_percent=200;max_merge_width=4294967295;min_merge_width=2;size_ratio=1;} memtable_prefix_bloom_size_ratio=0.000000 hard_pending_compaction_bytes_limit=274877906944 ttl=0 table_factory=BlockBasedTable soft_pending_compaction_bytes_limit=68719476736 prefix_extractor=nullptr bottommost_compression=kDisableCompressionOption force_consistency_checks=false paranoid_file_checks=false compaction_options_fifo={allow_compaction=false;max_table_files_size=1073741824;} max_bytes_for_level_multiplier=10.000000 optimize_filters_for_hits=false level_compaction_dynamic_level_bytes=false inplace_update_num_locks=10000 inplace_update_support=false disable_auto_compactions=false report_bg_io_stats=false

Aravindhan1995 commented 4 years ago

@adamretter @koldat can you guys please help me with this?

adamretter commented 4 years ago

@Aravindhan1995 I don't think this is a RocksJava specific question (which is my area of expertise), so you would need one of the other team members to comment. The mailing list is a better place for questions

koldat commented 4 years ago

There is an article for this https://github.com/facebook/rocksdb/wiki/Implement-Queue-Service-Using-RocksDB

Anyway what I do it to implement with not a single delete.

  1. Create family for your queue
  2. Store somewhere a read index (last key you have processed).
  3. Just iterate like you do (seek to last processed)
  4. Every 10000 messages or something you simply call deletefiles in range. It will drop files that are covered.

Delete files is super fast. Seek works fast, because there is no tombstone. Inefficiency is only size of one sst. So yes you will have couple of records there. But who cares about 100mb or so.

There are other tricks but I always design my app to not do any single delete using these approaches.