Open Wine93 opened 2 years ago
Hi, I think this is sort of expected as the Seek()
call goes through all tombstones in the memtable to try to find the first entry for the iterator. As from perf context, most of the time is spent on find_next_user_entry_time
. I think try DeleteRange
to combine all the deletions or CompactRange
to do a full compaction and remove all the tombstones might help.
Yes, that's a problem. We also encounter such problems in the production environment.
Hi, I think this is sort of expected as the
Seek()
call goes through all tombstones in the memtable to try to find the first entry for the iterator. As from perf context, most of the time is spent onfind_next_user_entry_time
. I think tryDeleteRange
to combine all the deletions orCompactRange
to do a full compaction and remove all the tombstones might help.
@cbi42 Thank you for your reply, but we can't use DeleteRange
or SingleDelete
in our application, so we try to tunning rocksdb to trigger compaction for deleted entrys (point tombstones) as soom as possible, it can reduce range scan performance skews in SST file:
options.level_compaction_dynamic_level_bytes = true // level-0 SST will compacted into bottom level directly
options.level0_file_num_compaction_trigger = 1 // number of files to trigger level-0 compaction
options.max_bytes_for_level_base = 1073741824 // bottom level total data size (1GB)
option.table_properties_collector_factories.emplace_back(rocksdb::NewCompactOnDeletionCollectorFactory(10000, 1000))
The above technologys are all for SST, but for memtables with many point tombstones which not flushed to level-0, how can we speed up the prefix scan?
@caipengbo Do you have some tunning experiences or suggestion for this? :)
BTW: the above tuning's effect maybe same as manual compaction (CompactRange).
but we can't use DeleteRange or SingleDelete in our application
What's the reason why you can't use DeleteRange?
but we can't use DeleteRange or SingleDelete in our application
What's the reason why you can't use DeleteRange?
Actually, our application received a delete request every times which invoked by user, it's not controled by us. In some scenarios, user will delete all keys in one prefix one by one, also the order of deletion is not consecutive, so we can't convert the point delete to range delete.
Actually, our application received a delete request every times which invoked by user, it's not controled by us. In some scenarios, user will delete all keys in one prefix one by one, also the order of deletion is not consecutive, so we can't convert the point delete to range delete.
So why not add a DELETE tag? Converts a user delete to a normal write.
I think the prefix scan is still slow even if we added a Delete
tag for each deleted key, because we should check whether it has Delete
tag for every key during scan, if it has, we will skip it, same as what rocksdb does now, so it will costs a lot of time to find the first exist key.
So why not add a DELETE tag? Converts a user delete to a normal write.
My bad, I forgot something.
Is there any any progress? We also encounter this problems.
Hi, guys~ I found that the
Iterator->Seek()
is very slow after delete many keys, and I found a related issue #5265, but i am not sure my problem is same as this one, so can you give me some advice? or if my problem is indeed the same as issue #5265, how can we solve it completely?Iterator->Seek("/prefix/") // cost about 20 us
for i = 1, 100000, 1 do // delete 100000 keys DB->Delete("/prefix/" + str(i)) end
Iterator->Seek("/prefix/") // cost about 13 ms
perf context(user_key_comparison_count = 420038 block_cache_hit_count = 1 block_read_count = 0 block_read_byte = 0 block_read_time = 0 block_cache_index_hit_count = 0 index_block_read_count = 0 block_cache_filter_hit_count = 0 filter_block_read_count = 0 compression_dict_block_read_count = 0 secondary_cache_hit_count = 0 block_checksum_time = 0 block_decompress_time = 0 get_read_bytes = 0 multiget_read_bytes = 0 iter_read_bytes = 88 internal_key_skipped_count = 180000 internal_delete_skipped_count = 30000 internal_recent_skipped_count = 0 internal_merge_count = 0 write_wal_time = 0 get_snapshot_time = 0 get_from_memtable_time = 0 get_from_memtable_count = 0 get_post_process_time = 0 get_from_output_files_time = 0 seek_on_memtable_time = 1716 seek_on_memtable_count = 2 next_on_memtable_count = 210000 prev_on_memtable_count = 0 seek_child_seek_time = 8258 seek_child_seek_count = 3 seek_min_heap_time = 512 seek_internal_seek_time = 9508 find_next_user_entry_time = 12749803 write_pre_and_post_process_time = 0 write_memtable_time = 0 write_thread_wait_nanos = 0 write_scheduling_flushes_compactions_time = 0 db_mutex_lock_nanos = 0 db_condition_wait_nanos = 0 merge_operator_time_nanos = 0 write_delay_time = 0 read_index_block_nanos = 0 read_filter_block_nanos = 0 new_table_block_iter_nanos = 2013 new_table_iterator_nanos = 0 block_seek_nanos = 2651 find_table_nanos = 0 bloom_memtable_hit_count = 1 bloom_memtable_miss_count = 1 bloom_sst_hit_count = 1 bloom_sst_miss_count = 0 key_lock_wait_time = 0 key_lock_wait_count = 0 env_new_sequential_file_nanos = 0 env_new_random_access_file_nanos = 0 env_new_writable_file_nanos = 0 env_reuse_writable_file_nanos = 0 env_new_random_rw_file_nanos = 0 env_new_directory_nanos = 0 env_file_exists_nanos = 0 env_get_children_nanos = 0 env_get_children_file_attributes_nanos = 0 env_delete_file_nanos = 0 env_create_dir_nanos = 0 env_create_dir_if_missing_nanos = 0 env_delete_dir_nanos = 0 env_get_file_size_nanos = 0 env_get_file_modification_time_nanos = 0 env_rename_file_nanos = 0 env_link_file_nanos = 0 env_lock_file_nanos = 0 env_unlock_file_nanos = 0 env_new_logger_nanos = 0 get_cpu_nanos = 0 iter_next_cpu_nanos = 0 iter_prev_cpu_nanos = 0 iter_seek_cpu_nanos = 12756520
thread_pool_id = 4 bytes_read = 0 bytes_written = 0 open_nanos = 0 allocate_nanos = 0 write_nanos = 0 read_nanos = 0 range_sync_nanos = 0 fsync_nanos = 0 prepare_write_nanos = 0 logger_nanos = 0 cpu_write_nanos = 0 cpu_read_nanos = 0 file_io_stats_by_temperature.hot_file_bytes_read = 0 file_io_stats_by_temperature.warm_file_bytes_read = 0 file_io_stats_by_temperature.cold_file_bytes_read = 0 file_io_stats_by_temperature.hot_file_read_count = 0 file_io_stats_by_temperature.warm_file_read_count = 0 file_io_stats_by_temperature.cold_file_read_count = 0