facebook / rocksdb

A library that provides an embeddable, persistent key-value store for fast storage.
http://rocksdb.org
GNU General Public License v2.0
28.5k stars 6.3k forks source link

in memory issue and does block cache data has lifetime ? #12185

Open Athi2019rit opened 10 months ago

Athi2019rit commented 10 months ago

Hi Guys, Is there any ttl available for block cache in rocksdb like it has in db (sst files) or when the ttl expired entries erased from the db does it also erased from block or it persists there until lru remove the least recently used ?

And also i tried with block cache size default 8mb and 100mb. There is no difference in the in memory accessed for my program both takes nearly 20mb of in memory then what is the difference in changing this. Why i cant see my changes.

this is my log file for block cache 100 mb:

2023/12/26-16:49:11.586989 d50 RocksDB version: 4.11.2 2023/12/26-16:49:11.587492 d50 Git sha rocksdb_build_git_sha:98084910a7355eb3447afb8f48549cd47acf55cf 2023/12/26-16:49:11.587492 d50 Compile date Dec 19 2023 2023/12/26-16:49:11.587492 d50 DB SUMMARY 2023/12/26-16:49:11.587492 d50 SST files in AgentDB dir, Total Num: 0, files: 2023/12/26-16:49:11.587492 d50 Write Ahead Log file in AgentDB: 2023/12/26-16:49:11.587492 d50 Options.error_if_exists: 0 2023/12/26-16:49:11.587492 d50 Options.create_if_missing: 1 2023/12/26-16:49:11.587492 d50 Options.paranoid_checks: 1 2023/12/26-16:49:11.587492 d50 Options.env: 0105D738 2023/12/26-16:49:11.601389 d50 Options.info_log: 180E8500 2023/12/26-16:49:11.601389 d50 Options.max_open_files: -1 2023/12/26-16:49:11.601389 d50 Options.max_file_opening_threads: 16 2023/12/26-16:49:11.601389 d50 Options.max_total_wal_size: 0 2023/12/26-16:49:11.601389 d50 Options.disableDataSync: 0 2023/12/26-16:49:11.601389 d50 Options.use_fsync: 0 2023/12/26-16:49:11.601389 d50 Options.max_log_file_size: 0 2023/12/26-16:49:11.601389 d50 Options.max_manifest_file_size: 18446744073709551615 2023/12/26-16:49:11.601389 d50 Options.log_file_time_to_roll: 0 2023/12/26-16:49:11.601389 d50 Options.keep_log_file_num: 5 2023/12/26-16:49:11.601389 d50 Options.recycle_log_file_num: 0 2023/12/26-16:49:11.601389 d50 Options.allow_os_buffer: 1 2023/12/26-16:49:11.601389 d50 Options.allow_mmap_reads: 0 2023/12/26-16:49:11.601389 d50 Options.allow_fallocate: 1 2023/12/26-16:49:11.601389 d50 Options.allow_mmap_writes: 0 2023/12/26-16:49:11.601389 d50 Options.create_missing_column_families: 0 2023/12/26-16:49:11.601389 d50 Options.db_log_dir: 2023/12/26-16:49:11.601389 d50 Options.wal_dir: C:/Program Files (x86)/Log360Cloud_Agent/bin/AgentDB 2023/12/26-16:49:11.601389 d50 Options.table_cache_numshardbits: 6 2023/12/26-16:49:11.601389 d50 Options.delete_obsolete_files_period_micros: 21600000000 2023/12/26-16:49:11.601389 d50 Options.base_background_compactions: 1 2023/12/26-16:49:11.601389 d50 Options.max_background_compactions: 1 2023/12/26-16:49:11.601389 d50 Options.max_subcompactions: 1 2023/12/26-16:49:11.601389 d50 Options.max_background_flushes: 1 2023/12/26-16:49:11.601389 d50 Options.WAL_ttl_seconds: 0 2023/12/26-16:49:11.601389 d50 Options.WAL_size_limit_MB: 0 2023/12/26-16:49:11.601869 d50 Options.manifest_preallocation_size: 4194304 2023/12/26-16:49:11.601869 d50 Options.allow_os_buffer: 1 2023/12/26-16:49:11.601869 d50 Options.allow_mmap_reads: 0 2023/12/26-16:49:11.601869 d50 Options.allow_mmap_writes: 0 2023/12/26-16:49:11.601869 d50 Options.is_fd_close_on_exec: 1 2023/12/26-16:49:11.601869 d50 Options.stats_dump_period_sec: 600 2023/12/26-16:49:11.601869 d50 Options.advise_random_on_open: 1 2023/12/26-16:49:11.601869 d50 Options.db_write_buffer_size: 0d 2023/12/26-16:49:11.601869 d50 Options.access_hint_on_compaction_start: NORMAL 2023/12/26-16:49:11.601869 d50 Options.new_table_reader_for_compaction_inputs: 0 2023/12/26-16:49:11.601869 d50 Options.compaction_readahead_size: 0d 2023/12/26-16:49:11.601869 d50 Options.random_access_max_buffer_size: 1048576d 2023/12/26-16:49:11.601869 d50 Options.writable_file_max_buffer_size: 1048576d 2023/12/26-16:49:11.601869 d50 Options.use_adaptive_mutex: 0 2023/12/26-16:49:11.601869 d50 Options.rate_limiter: 00000000 2023/12/26-16:49:11.601869 d50 Options.sst_file_manager.rate_bytes_per_sec: 0 2023/12/26-16:49:11.601869 d50 Options.bytes_per_sync: 0 2023/12/26-16:49:11.601869 d50 Options.wal_bytes_per_sync: 0 2023/12/26-16:49:11.601869 d50 Options.wal_recovery_mode: 2 2023/12/26-16:49:11.601869 d50 Options.enable_thread_tracking: 0 2023/12/26-16:49:11.601869 d50 Options.allow_concurrent_memtable_write: 0 2023/12/26-16:49:11.601869 d50 Options.enable_write_thread_adaptive_yield: 0 2023/12/26-16:49:11.601869 d50 Options.write_thread_max_yield_usec: 100 2023/12/26-16:49:11.601869 d50 Options.write_thread_slow_yield_usec: 3 2023/12/26-16:49:11.601869 d50 Options.row_cache: None 2023/12/26-16:49:11.601869 d50 Options.wal_filter: None 2023/12/26-16:49:11.601869 d50 Options.avoid_flush_during_recovery: 0 2023/12/26-16:49:11.601869 d50 Compression algorithms supported: 2023/12/26-16:49:11.601869 d50 Snappy supported: 0 2023/12/26-16:49:11.601869 d50 Zlib supported: 0 2023/12/26-16:49:11.601869 d50 Bzip supported: 0 2023/12/26-16:49:11.601869 d50 LZ4 supported: 0 2023/12/26-16:49:11.601869 d50 Fast CRC32 supported: 0 2023/12/26-16:49:11.602365 d50 Creating manifest 1 2023/12/26-16:49:11.666085 d50 Recovering from manifest file: MANIFEST-000001 2023/12/26-16:49:11.667073 d50 --------------- Options for column family [default]: 2023/12/26-16:49:11.667073 d50 Options.comparator: rocksdb.InternalKeyComparator:leveldb.BytewiseComparator 2023/12/26-16:49:11.667073 d50 Options.merge_operator: None 2023/12/26-16:49:11.667073 d50 Options.compaction_filter: None 2023/12/26-16:49:11.667073 d50 Options.compaction_filter_factory: TtlCompactionFilterFactory 2023/12/26-16:49:11.667073 d50 Options.memtable_factory: SkipListFactory 2023/12/26-16:49:11.667073 d50 Options.table_factory: BlockBasedTable 2023/12/26-16:49:11.667073 d50 table_factory options: flush_block_policy_factory: FlushBlockBySizePolicyFactory (195FDF88) cache_index_and_filter_blocks: 0 pin_l0_filter_and_index_blocks_in_cache: 0 index_type: 0 hash_index_allow_collision: 1 checksum: 1 no_block_cache: 0 block_cache: 19B84A10 block_cache_size: 104857600 block_cache_compressed: 00000000 block_size: 4096 block_size_deviation: 10 block_restart_interval: 16 index_block_restart_interval: 1 filter_policy: nullptr whole_key_filtering: 1 skip_table_builder_flush: 0 format_version: 2 2023/12/26-16:49:11.667073 d50 Options.write_buffer_size: 67108864 2023/12/26-16:49:11.667073 d50 Options.max_write_buffer_number: 2 2023/12/26-16:49:11.667073 d50 Options.compression: NoCompression 2023/12/26-16:49:11.667073 d50 Options.bottommost_compression: Disabled 2023/12/26-16:49:11.667073 d50 Options.prefix_extractor: nullptr 2023/12/26-16:49:11.667073 d50 Options.num_levels: 7 2023/12/26-16:49:11.667073 d50 Options.min_write_buffer_number_to_merge: 1 2023/12/26-16:49:11.667073 d50 Options.max_write_buffer_number_to_maintain: 0 2023/12/26-16:49:11.667073 d50 Options.compression_opts.window_bits: -14 2023/12/26-16:49:11.667073 d50 Options.compression_opts.level: -1 2023/12/26-16:49:11.667073 d50 Options.compression_opts.strategy: 0 2023/12/26-16:49:11.667073 d50 Options.compression_opts.max_dict_bytes: 0 2023/12/26-16:49:11.667073 d50 Options.level0_file_num_compaction_trigger: 4 2023/12/26-16:49:11.667073 d50 Options.level0_slowdown_writes_trigger: 20 2023/12/26-16:49:11.667073 d50 Options.level0_stop_writes_trigger: 24 2023/12/26-16:49:11.667073 d50 Options.target_file_size_base: 67108864 2023/12/26-16:49:11.667073 d50 Options.target_file_size_multiplier: 1 2023/12/26-16:49:11.667073 d50 Options.max_bytes_for_level_base: 268435456 2023/12/26-16:49:11.667073 d50 Options.level_compaction_dynamic_level_bytes: 0 2023/12/26-16:49:11.667073 d50 Options.max_bytes_for_level_multiplier: 10 2023/12/26-16:49:11.667073 d50 Options.max_bytes_for_level_multiplier_addtl[0]: 1 2023/12/26-16:49:11.667073 d50 Options.max_bytes_for_level_multiplier_addtl[1]: 1 2023/12/26-16:49:11.667073 d50 Options.max_bytes_for_level_multiplier_addtl[2]: 1 2023/12/26-16:49:11.667073 d50 Options.max_bytes_for_level_multiplier_addtl[3]: 1 2023/12/26-16:49:11.667073 d50 Options.max_bytes_for_level_multiplier_addtl[4]: 1 2023/12/26-16:49:11.667073 d50 Options.max_bytes_for_level_multiplier_addtl[5]: 1 2023/12/26-16:49:11.667073 d50 Options.max_bytes_for_level_multiplier_addtl[6]: 1 2023/12/26-16:49:11.667073 d50 Options.max_sequential_skip_in_iterations: 8 2023/12/26-16:49:11.667073 d50 Options.expanded_compaction_factor: 25 2023/12/26-16:49:11.667073 d50 Options.source_compaction_factor: 1 2023/12/26-16:49:11.667073 d50 Options.max_grandparent_overlap_factor: 10 2023/12/26-16:49:11.667073 d50 Options.arena_block_size: 8388608 2023/12/26-16:49:11.667073 d50 Options.soft_pending_compaction_bytes_limit: 68719476736 2023/12/26-16:49:11.667073 d50 Options.hard_pending_compaction_bytes_limit: 274877906944 2023/12/26-16:49:11.667569 d50 Options.rate_limit_delay_max_milliseconds: 1000 2023/12/26-16:49:11.667569 d50 Options.disable_auto_compactions: 0 2023/12/26-16:49:11.667569 d50 Options.verify_checksums_in_compaction: 1 2023/12/26-16:49:11.667569 d50 Options.compaction_style: 0 2023/12/26-16:49:11.667569 d50 Options.compaction_pri: 0 2023/12/26-16:49:11.667569 d50 Options.compaction_options_universal.size_ratio: 1 2023/12/26-16:49:11.667569 d50 Options.compaction_options_universal.min_merge_width: 2 2023/12/26-16:49:11.667569 d50 Options.compaction_options_universal.max_merge_width: 4294967295 2023/12/26-16:49:11.667569 d50 Options.compaction_options_universal.max_size_amplification_percent: 200 2023/12/26-16:49:11.667569 d50 Options.compaction_options_universal.compression_size_percent: -1 2023/12/26-16:49:11.667569 d50 Options.compaction_options_fifo.max_table_files_size: 1073741824 2023/12/26-16:49:11.667569 d50 Options.table_properties_collectors: 2023/12/26-16:49:11.667569 d50 Options.inplace_update_support: 0 2023/12/26-16:49:11.667569 d50 Options.inplace_update_num_locks: 10000 2023/12/26-16:49:11.667569 d50 Options.min_partial_merge_operands: 2 2023/12/26-16:49:11.667569 d50 Options.memtable_prefix_bloom_size_ratio: 0.000000 2023/12/26-16:49:11.667569 d50 Options.memtable_huge_page_size: 0 2023/12/26-16:49:11.667569 d50 Options.bloom_locality: 0 2023/12/26-16:49:11.667569 d50 Options.max_successive_merges: 0 2023/12/26-16:49:11.667569 d50 Options.optimize_filters_for_hits: 0 2023/12/26-16:49:11.667569 d50 Options.paranoid_file_checks: 0 2023/12/26-16:49:11.667569 d50 Options.report_bg_io_stats: 0 2023/12/26-16:49:11.671537 d50 Recovered from manifest file:C:/Program Files (x86)/Log360Cloud_Agent/bin/AgentDB/MANIFEST-000001 succeeded,manifest_file_number is 1, next_file_number is 3, last_sequence is 0, log_number is 0,prev_log_number is 0,max_column_family is 0 2023/12/26-16:49:11.671537 d50 Column family [default] (ID 0), log number is 0 2023/12/26-16:49:11.693361 d50 DB pointer 16F4DC20 2023/12/26-16:49:16.002214 3680 [default] Manual compaction starting 2023/12/26-16:51:09.394442 2bcc [default] New memtable created with log file: #6. Immutable memtables: 0. 2023/12/26-16:51:09.394903 33dc [JOB 2] Syncing log #3 2023/12/26-16:51:09.415330 33dc (Original Log Time 2023/12/26-16:51:09.394903) Calling FlushMemTableToOutputFile with column family [default], flush slots available 1, compaction slots allowed 1, compaction slots scheduled 1 2023/12/26-16:51:09.415330 33dc [default] [JOB 2] Flushing memtable with next log file: 6 2023/12/26-16:51:09.415330 33dc EVENT_LOG_v1 {"time_micros": 1703638269415330, "job": 2, "event": "flush_started", "num_memtables": 1, "num_entries": 9589, "num_deletes": 0, "memory_usage": 2548916} 2023/12/26-16:51:09.415759 33dc [default] [JOB 2] Level-0 flush table #7: started 2023/12/26-16:51:09.480204 33dc EVENT_LOG_v1 {"time_micros": 1703638269480204, "cf_name": "default", "job": 2, "event": "table_file_creation", "file_number": 7, "file_size": 2302622, "table_properties": {"data_size": 2275036, "index_size": 26745, "filter_size": 0, "raw_key_size": 901728, "raw_average_key_size": 94, "raw_value_size": 1561798, "raw_average_value_size": 162, "num_data_blocks": 576, "num_entries": 9589, "filter_policy_name": "", "kDeletedKeys": "0", "kMergeOperands": "0"}} 2023/12/26-16:51:09.480204 33dc [default] [JOB 2] Level-0 flush table #7: 2302622 bytes OK 2023/12/26-16:51:09.480204 33dc Creating manifest 8 2023/12/26-16:51:09.513933 33dc (Original Log Time 2023/12/26-16:51:09.480204) [default] Level-0 commit table #7 started 2023/12/26-16:51:09.513933 33dc (Original Log Time 2023/12/26-16:51:09.513933) [default] Level-0 commit table #7: memtable #1 done 2023/12/26-16:51:09.513933 33dc (Original Log Time 2023/12/26-16:51:09.513933) EVENT_LOG_v1 {"time_micros": 1703638269513933, "job": 2, "event": "flush_finished", "lsm_state": [1, 0, 0, 0, 0, 0, 0], "immutable_memtables": 0} 2023/12/26-16:51:09.513933 33dc (Original Log Time 2023/12/26-16:51:09.513933) [default] Level summary: base level 1 max bytes base 268435456 files[1 0 0 0 0 0 0] max score 0.25 2023/12/26-16:51:09.513933 33dc [JOB 2] Try to delete WAL files size 2535935, prev total WAL file size 2535935, number of live WAL files 2. 2023/12/26-16:51:09.514429 33dc [DEBUG] [JOB 2] Delete C:/Program Files (x86)/Log360Cloud_Agent/bin/AgentDB/000003.log type=0 #3 -- OK 2023/12/26-16:51:09.514429 33dc [DEBUG] [JOB 2] Delete C:/Program Files (x86)/Log360Cloud_Agent/bin/AgentDB//MANIFEST-000001 type=3 #1 -- OK **

**

Suggest some options or configuration that can be used to improve the read performance. I can't find answer to this. Kindly help me with this.

ajkr commented 10 months ago

A block stays in block cache until evicted by LRU.

From the LOG:

"raw_key_size": 901728 ... "raw_value_size": 1561798

So the uncompressed data size is only 1561798+901728 =~ 2.5MB. Block cache should hold all the blocks no matter its capacity is 8MB or 100MB.

What is the read performance issue? How much QPS are you trying to achieve?

Athi2019rit commented 10 months ago

A block stays in block cache until evicted by LRU.

So even the expired entries from sst files are removed during compaction. The cache still holds the value until it evicts by the LRU

What is the read performance issue? How much QPS are you trying to achieve?

In my use case, I didn't mean QPS but collection rate before and after using rocksdb. Collection rate before rocksdb - 4000 per second (average). Collection rate after rocksdb - 3800 per second (average).

Yes there might be difference in before and after collection rate because of doing some additional operation in after collection. But reading from a unordered_map instead of rocksdb's LRU cache seems improves the after collection rate by +100.

So the uncompressed data size is only 1561798+901728 =~ 2.5MB. Block cache should hold all the blocks no matter its capacity is 8MB or 100MB.

Even the uncompressed data size is ~ 2.5 mb. Is there any operation or blocks use additional in-memory, because while using rocksdb I've noted that memory increases by ~ 15-20 mb.

Why turning on the cache_index_and_filter_blocks slows down the performance ?

ajkr commented 9 months ago

So even the expired entries from sst files are removed during compaction. The cache still holds the value until it evicts by the LRU

Right, blocks from deleted files remain in block cache until LRU evicts them

But reading from a unordered_map instead of rocksdb's LRU cache seems improves the after collection rate by +100.

SST file lookups are not particularly fast. You could try row cache to bypass SST file lookups. It'll use more memory though.

Even the uncompressed data size is ~ 2.5 mb. Is there any operation or blocks use additional in-memory, because while using rocksdb I've noted that memory increases by ~ 15-20 mb.

Related to your earlier question, the 2.5MB could be amplified if the data was compacted and the block cache still contains blocks from the deleted files. There's other uses, like memtable. I would hope that memtable does not consume much memory when it is empty, but can't promise, especially considering the version is 4.11. A profile could tell us more definitively.

Why turning on the cache_index_and_filter_blocks slows down the performance ?

Yes it's counterintuitive. It's because the index and filter blocks are held in memory either way. In case cache_index_and_filter_blocks=false they are held in table reader memory. When cache_index_and_filter_blocks=true they are held in block cache memory. Accessing them on table reader is cheaper because it doesn't require locking to manage an LRU list.

Athi2019rit commented 9 months ago

SST file lookups are not particularly fast. You could try row cache to bypass SST file lookups. It'll use more memory though.

Yes, but read happens from block cache right ? why there is a slowness in that .

Related to your earlier question, the 2.5MB could be amplified if the data was compacted and the block cache still contains blocks from the deleted files. There's other uses, like memtable. I would hope that memtable does not consume much memory when it is empty, but can't promise, especially considering the version is 4.11. A profile could tell us more definitively.

Does this means that the raw key and raw value size of sst files differs from the block cache key value size ? if it is, why it is happening ? Does the index happens in-memory ? In my case i do compaction for every 30 mins the above said memory increment is noted less than 30 mins.

ajkr commented 9 months ago

Yes, but read happens from block cache right ? why there is a slowness in that .

The layout of a sorted string table isn't the best for random lookups. Even if the sorted string table is entirely in memory, which it is in your case, an exact-match lookup usually won't be as fast as an std::unordered_map lookup. You can run a CPU profile on the lookup process to compare different approaches.

Another feature you could consider to try closing the perf gap with std::unordered_map is data block hash index: https://github.com/facebook/rocksdb/blob/a036525809a7511ae119488973c953ef7151991b/include/rocksdb/table.h#L242

Does this means that the raw key and raw value size of sst files differs from the block cache key value size ? if it is, why it is happening ? Does the index happens in-memory ? In my case i do compaction for every 30 mins the above said memory increment is noted less than 30 mins.

Are you able to provide a heap profile? It sounds like we ruled out my last guess (data blocks surviving in block cache after their files are deleted). A profile might help us get to an answer faster than guess and check