Open mvandeberg opened 5 years ago
I wanted to provide some additional information on this thread. The application makes heavy use of DB::Iter
. Essentially, the implementation has lower_bound
, upper_bound
, find
, returning a wrapped DB::Iter
.
The most common usage is similar to this pattern:
auto lower = index.lower_bound( 5 );
auto upper = index.upper_bound( 5 );
for ( auto iter = lower; iter != upper; ++iter )
{
// do work
}
RocksDB is set up to have 1 database per boost::multi_index_container
.
Each one of the indices are represented as a column family, usually some where between 1 - 10 indices each.
The first column family key value pair is (Primary Key, Binary Serialized Value) The second to N column family is (Secondary Key, Primary Key)
Therefore, looking up an object on a secondary key retrieves its primary key. You can then pull the value out of the first column family.
Objects are constantly being added/modified/deleted throughout the run.
These tests are done on an SSD.
The total runtime of the test is 2354.92380300000013449 sec
Boost Multi Index can perform this workload in 6s
So far we've been able to get RocksDB to do this in 120s
The currently used options are as follows:
# This is a RocksDB option file.
#
# For detailed file format spec, please refer to the example file
# in examples/rocksdb_option_file_example.ini
#
[Version]
rocksdb_version=5.17.2
options_file_version=1.1
[DBOptions]
allow_mmap_writes=false
base_background_compactions=-1
new_table_reader_for_compaction_inputs=false
db_log_dir=
wal_recovery_mode=kPointInTimeRecovery
use_direct_reads=false
write_thread_max_yield_usec=100
max_manifest_file_size=1073741824
allow_2pc=false
allow_fallocate=true
fail_if_options_file_error=false
allow_ingest_behind=false
allow_mmap_reads=true
skip_log_error_on_recovery=false
recycle_log_file_num=0
delete_obsolete_files_period_micros=21600000000
compaction_readahead_size=0
use_direct_io_for_flush_and_compaction=false
log_file_time_to_roll=0
create_missing_column_families=false
advise_random_on_open=true
max_log_file_size=0
stats_dump_period_sec=20
enable_thread_tracking=false
use_adaptive_mutex=false
create_if_missing=false
is_fd_close_on_exec=true
max_background_flushes=1
manifest_preallocation_size=4194304
error_if_exists=false
skip_stats_update_on_db_open=false
max_open_files=-1
random_access_max_buffer_size=1048576
use_fsync=false
max_background_jobs=16
two_write_queues=false
max_background_compactions=8
max_file_opening_threads=16
table_cache_numshardbits=6
keep_log_file_num=1000
avoid_flush_during_shutdown=false
db_write_buffer_size=0
max_total_wal_size=0
wal_dir=/Users/sgerbino/.steemd/blockchain/rocksdb_account_object
max_subcompactions=1
WAL_size_limit_MB=0
paranoid_checks=true
allow_concurrent_memtable_write=true
writable_file_max_buffer_size=1048576
WAL_ttl_seconds=0
delayed_write_rate=16777216
bytes_per_sync=0
wal_bytes_per_sync=0
enable_pipelined_write=false
enable_write_thread_adaptive_yield=true
write_thread_slow_yield_usec=3
access_hint_on_compaction_start=NORMAL
info_log_level=INFO_LEVEL
dump_malloc_stats=false
avoid_flush_during_recovery=false
preserve_deletes=false
manual_wal_flush=false
[CFOptions "default"]
report_bg_io_stats=false
inplace_update_support=false
max_compaction_bytes=1677721600
disable_auto_compactions=false
write_buffer_size=67108864
bloom_locality=0
max_bytes_for_level_multiplier=10.000000
compaction_filter_factory=nullptr
optimize_filters_for_hits=false
target_file_size_base=67108864
max_write_buffer_number_to_maintain=0
hard_pending_compaction_bytes_limit=274877906944
paranoid_file_checks=false
memtable_prefix_bloom_size_ratio=0.000000
force_consistency_checks=false
max_write_buffer_number=2
ttl=0
max_bytes_for_level_multiplier_additional=1:1:1:1:1:1:1
level0_slowdown_writes_trigger=20
level_compaction_dynamic_level_bytes=false
compaction_options_fifo={allow_compaction=false;ttl=0;max_table_files_size=1073741824;}
inplace_update_num_locks=10000
level0_file_num_compaction_trigger=4
compression=kSnappyCompression
level0_stop_writes_trigger=36
num_levels=7
table_factory=BlockBasedTable
compression_per_level=
target_file_size_multiplier=1
min_write_buffer_number_to_merge=1
arena_block_size=8388608
max_successive_merges=0
memtable_huge_page_size=0
compaction_pri=kByCompensatedSize
soft_pending_compaction_bytes_limit=68719476736
max_bytes_for_level_base=268435456
comparator=leveldb.BytewiseComparator
max_sequential_skip_in_iterations=8
bottommost_compression=kDisableCompressionOption
prefix_extractor=nullptr
memtable_insert_with_hint_prefix_extractor=nullptr
memtable_factory=SkipListFactory
compaction_filter=nullptr
compaction_options_universal={allow_trivial_move=false;stop_style=kCompactionStopStyleTotalSize;min_merge_width=2;compression_size_percent=-1;max_size_amplification_percent=200;max_merge_width=4294967295;size_ratio=1;}
merge_operator=nullptr
compaction_style=kCompactionStyleLevel
[TableOptions/BlockBasedTable "default"]
format_version=2
whole_key_filtering=true
verify_compression=false
partition_filters=false
enable_index_compression=true
checksum=kCRC32c
index_block_restart_interval=1
pin_top_level_index_and_filter=true
block_align=false
block_size=4096
index_type=kBinarySearch
filter_policy=nullptr
metadata_block_size=4096
no_block_cache=false
block_size_deviation=10
data_block_index_type=kDataBlockBinarySearch
data_block_hash_table_util_ratio=0.750000
read_amp_bytes_per_bit=8589934592
cache_index_and_filter_blocks=false
block_restart_interval=16
pin_l0_filter_and_index_blocks_in_cache=false
hash_index_allow_collision=true
cache_index_and_filter_blocks_with_high_priority=false
flush_block_policy_factory=FlushBlockBySizePolicyFactory
[CFOptions "boost\:\:mpl\:\:v_item<steem\:\:chain\:\:by_id, boost\:\:mpl\:\:vector0<mpl_\:\:na>, 0>"]
report_bg_io_stats=false
inplace_update_support=false
max_compaction_bytes=1677721600
disable_auto_compactions=false
write_buffer_size=67108864
bloom_locality=0
max_bytes_for_level_multiplier=10.000000
compaction_filter_factory=nullptr
optimize_filters_for_hits=false
target_file_size_base=67108864
max_write_buffer_number_to_maintain=0
hard_pending_compaction_bytes_limit=274877906944
paranoid_file_checks=false
memtable_prefix_bloom_size_ratio=0.000000
force_consistency_checks=false
max_write_buffer_number=2
ttl=0
max_bytes_for_level_multiplier_additional=1:1:1:1:1:1:1
level0_slowdown_writes_trigger=20
level_compaction_dynamic_level_bytes=false
compaction_options_fifo={allow_compaction=false;ttl=0;max_table_files_size=1073741824;}
inplace_update_num_locks=10000
level0_file_num_compaction_trigger=4
compression=kSnappyCompression
level0_stop_writes_trigger=36
num_levels=7
table_factory=BlockBasedTable
compression_per_level=
target_file_size_multiplier=1
min_write_buffer_number_to_merge=1
arena_block_size=8388608
max_successive_merges=0
memtable_huge_page_size=0
compaction_pri=kByCompensatedSize
soft_pending_compaction_bytes_limit=68719476736
max_bytes_for_level_base=268435456
comparator=mira::multi_index::detail::abstract_slice_comparator<chainbase::oid<steem::chain::account_object>, std::__1::less<chainbase::oid<steem::chain::account_object> > > const*
max_sequential_skip_in_iterations=8
bottommost_compression=kDisableCompressionOption
prefix_extractor=nullptr
memtable_insert_with_hint_prefix_extractor=nullptr
memtable_factory=SkipListFactory
compaction_filter=nullptr
compaction_options_universal={allow_trivial_move=false;stop_style=kCompactionStopStyleTotalSize;min_merge_width=2;compression_size_percent=-1;max_size_amplification_percent=200;max_merge_width=4294967295;size_ratio=1;}
merge_operator=nullptr
compaction_style=kCompactionStyleLevel
[TableOptions/BlockBasedTable "boost\:\:mpl\:\:v_item<steem\:\:chain\:\:by_id, boost\:\:mpl\:\:vector0<mpl_\:\:na>, 0>"]
format_version=2
whole_key_filtering=true
verify_compression=false
partition_filters=false
enable_index_compression=true
checksum=kCRC32c
index_block_restart_interval=1
pin_top_level_index_and_filter=true
block_align=false
block_size=4096
index_type=kBinarySearch
filter_policy=nullptr
metadata_block_size=4096
no_block_cache=false
block_size_deviation=10
data_block_index_type=kDataBlockBinarySearch
data_block_hash_table_util_ratio=0.750000
read_amp_bytes_per_bit=8589934592
cache_index_and_filter_blocks=false
block_restart_interval=16
pin_l0_filter_and_index_blocks_in_cache=false
hash_index_allow_collision=true
cache_index_and_filter_blocks_with_high_priority=false
flush_block_policy_factory=FlushBlockBySizePolicyFactory
[CFOptions "boost\:\:mpl\:\:v_item<steem\:\:chain\:\:by_name, boost\:\:mpl\:\:vector0<mpl_\:\:na>, 0>"]
report_bg_io_stats=false
inplace_update_support=false
max_compaction_bytes=1677721600
disable_auto_compactions=false
write_buffer_size=67108864
bloom_locality=0
max_bytes_for_level_multiplier=10.000000
compaction_filter_factory=nullptr
optimize_filters_for_hits=false
target_file_size_base=67108864
max_write_buffer_number_to_maintain=0
hard_pending_compaction_bytes_limit=274877906944
paranoid_file_checks=false
memtable_prefix_bloom_size_ratio=0.000000
force_consistency_checks=false
max_write_buffer_number=2
ttl=0
max_bytes_for_level_multiplier_additional=1:1:1:1:1:1:1
level0_slowdown_writes_trigger=20
level_compaction_dynamic_level_bytes=false
compaction_options_fifo={allow_compaction=false;ttl=0;max_table_files_size=1073741824;}
inplace_update_num_locks=10000
level0_file_num_compaction_trigger=4
compression=kSnappyCompression
level0_stop_writes_trigger=36
num_levels=7
table_factory=BlockBasedTable
compression_per_level=
target_file_size_multiplier=1
min_write_buffer_number_to_merge=1
arena_block_size=8388608
max_successive_merges=0
memtable_huge_page_size=0
compaction_pri=kByCompensatedSize
soft_pending_compaction_bytes_limit=68719476736
max_bytes_for_level_base=268435456
comparator=mira::multi_index::detail::abstract_slice_comparator<steem::protocol::fixed_string_impl<fc::uint128>, std::__1::less<steem::protocol::fixed_string_impl<fc::uint128> > > const*
max_sequential_skip_in_iterations=8
bottommost_compression=kDisableCompressionOption
prefix_extractor=nullptr
memtable_insert_with_hint_prefix_extractor=nullptr
memtable_factory=SkipListFactory
compaction_filter=nullptr
compaction_options_universal={allow_trivial_move=false;stop_style=kCompactionStopStyleTotalSize;min_merge_width=2;compression_size_percent=-1;max_size_amplification_percent=200;max_merge_width=4294967295;size_ratio=1;}
merge_operator=nullptr
compaction_style=kCompactionStyleLevel
[TableOptions/BlockBasedTable "boost\:\:mpl\:\:v_item<steem\:\:chain\:\:by_name, boost\:\:mpl\:\:vector0<mpl_\:\:na>, 0>"]
format_version=2
whole_key_filtering=true
verify_compression=false
partition_filters=false
enable_index_compression=true
checksum=kCRC32c
index_block_restart_interval=1
pin_top_level_index_and_filter=true
block_align=false
block_size=4096
index_type=kBinarySearch
filter_policy=nullptr
metadata_block_size=4096
no_block_cache=false
block_size_deviation=10
data_block_index_type=kDataBlockBinarySearch
data_block_hash_table_util_ratio=0.750000
read_amp_bytes_per_bit=8589934592
cache_index_and_filter_blocks=false
block_restart_interval=16
pin_l0_filter_and_index_blocks_in_cache=false
hash_index_allow_collision=true
cache_index_and_filter_blocks_with_high_priority=false
flush_block_policy_factory=FlushBlockBySizePolicyFactory
[CFOptions "boost\:\:mpl\:\:v_item<steem\:\:chain\:\:by_proxy, boost\:\:mpl\:\:vector0<mpl_\:\:na>, 0>"]
report_bg_io_stats=false
inplace_update_support=false
max_compaction_bytes=1677721600
disable_auto_compactions=false
write_buffer_size=67108864
bloom_locality=0
max_bytes_for_level_multiplier=10.000000
compaction_filter_factory=nullptr
optimize_filters_for_hits=false
target_file_size_base=67108864
max_write_buffer_number_to_maintain=0
hard_pending_compaction_bytes_limit=274877906944
paranoid_file_checks=false
memtable_prefix_bloom_size_ratio=0.000000
force_consistency_checks=false
max_write_buffer_number=2
ttl=0
max_bytes_for_level_multiplier_additional=1:1:1:1:1:1:1
level0_slowdown_writes_trigger=20
level_compaction_dynamic_level_bytes=false
compaction_options_fifo={allow_compaction=false;ttl=0;max_table_files_size=1073741824;}
inplace_update_num_locks=10000
level0_file_num_compaction_trigger=4
compression=kSnappyCompression
level0_stop_writes_trigger=36
num_levels=7
table_factory=BlockBasedTable
compression_per_level=
target_file_size_multiplier=1
min_write_buffer_number_to_merge=1
arena_block_size=8388608
max_successive_merges=0
memtable_huge_page_size=0
compaction_pri=kByCompensatedSize
soft_pending_compaction_bytes_limit=68719476736
max_bytes_for_level_base=268435456
comparator=mira::multi_index::detail::abstract_slice_comparator<mira::multi_index::composite_key_result<mira::multi_index::composite_key<steem::chain::account_object, mira::multi_index::member<steem::chain::account_object, steem::protocol::fixed_string_impl<fc::uint128>, &(steem::chain::account_object::proxy)>, mira::multi_index::member<steem::chain::account_object, steem::protocol::fixed_string_impl<fc::uint128>, &(steem::chain::account_object::name)>, boost::tuples::null_type, boost::tuples::null_type, boost::tuples::null_type, boost::tuples::null_type, boost::tuples::null_type, boost::tuples::null_type, boost::tuples::null_type, boost::tuples::null_type> >, std::__1::less<mira::multi_index::composite_key_result<mira::multi_index::composite_key<steem::chain::account_object, mira::multi_index::member<steem::chain::account_object, steem::protocol::fixed_string_impl<fc::uint128>, &(steem::chain::account_object::proxy)>, mira::multi_index::member<steem::chain::account_object, steem::protocol::fixed_string_impl<fc::uint128>, &(steem::chain::account_object::name)>, boost::tuples::null_type, boost::tuples::null_type, boost::tuples::null_type, boost::tuples::null_type, boost::tuples::null_type, boost::tuples::null_type, boost::tuples::null_type, boost::tuples::null_type> > > > const*
max_sequential_skip_in_iterations=8
bottommost_compression=kDisableCompressionOption
prefix_extractor=nullptr
memtable_insert_with_hint_prefix_extractor=nullptr
memtable_factory=SkipListFactory
compaction_filter=nullptr
compaction_options_universal={allow_trivial_move=false;stop_style=kCompactionStopStyleTotalSize;min_merge_width=2;compression_size_percent=-1;max_size_amplification_percent=200;max_merge_width=4294967295;size_ratio=1;}
merge_operator=nullptr
compaction_style=kCompactionStyleLevel
[TableOptions/BlockBasedTable "boost\:\:mpl\:\:v_item<steem\:\:chain\:\:by_proxy, boost\:\:mpl\:\:vector0<mpl_\:\:na>, 0>"]
format_version=2
whole_key_filtering=true
verify_compression=false
partition_filters=false
enable_index_compression=true
checksum=kCRC32c
index_block_restart_interval=1
pin_top_level_index_and_filter=true
block_align=false
block_size=4096
index_type=kBinarySearch
filter_policy=nullptr
metadata_block_size=4096
no_block_cache=false
block_size_deviation=10
data_block_index_type=kDataBlockBinarySearch
data_block_hash_table_util_ratio=0.750000
read_amp_bytes_per_bit=8589934592
cache_index_and_filter_blocks=false
block_restart_interval=16
pin_l0_filter_and_index_blocks_in_cache=false
hash_index_allow_collision=true
cache_index_and_filter_blocks_with_high_priority=false
flush_block_policy_factory=FlushBlockBySizePolicyFactory
[CFOptions "boost\:\:mpl\:\:v_item<steem\:\:chain\:\:by_next_vesting_withdrawal, boost\:\:mpl\:\:vector0<mpl_\:\:na>, 0>"]
report_bg_io_stats=false
inplace_update_support=false
max_compaction_bytes=1677721600
disable_auto_compactions=false
write_buffer_size=67108864
bloom_locality=0
max_bytes_for_level_multiplier=10.000000
compaction_filter_factory=nullptr
optimize_filters_for_hits=false
target_file_size_base=67108864
max_write_buffer_number_to_maintain=0
hard_pending_compaction_bytes_limit=274877906944
paranoid_file_checks=false
memtable_prefix_bloom_size_ratio=0.000000
force_consistency_checks=false
max_write_buffer_number=2
ttl=0
max_bytes_for_level_multiplier_additional=1:1:1:1:1:1:1
level0_slowdown_writes_trigger=20
level_compaction_dynamic_level_bytes=false
compaction_options_fifo={allow_compaction=false;ttl=0;max_table_files_size=1073741824;}
inplace_update_num_locks=10000
level0_file_num_compaction_trigger=4
compression=kSnappyCompression
level0_stop_writes_trigger=36
num_levels=7
table_factory=BlockBasedTable
compression_per_level=
target_file_size_multiplier=1
min_write_buffer_number_to_merge=1
arena_block_size=8388608
max_successive_merges=0
memtable_huge_page_size=0
compaction_pri=kByCompensatedSize
soft_pending_compaction_bytes_limit=68719476736
max_bytes_for_level_base=268435456
comparator=mira::multi_index::detail::abstract_slice_comparator<mira::multi_index::composite_key_result<mira::multi_index::composite_key<steem::chain::account_object, mira::multi_index::member<steem::chain::account_object, fc::time_point_sec, &(steem::chain::account_object::next_vesting_withdrawal)>, mira::multi_index::member<steem::chain::account_object, steem::protocol::fixed_string_impl<fc::uint128>, &(steem::chain::account_object::name)>, boost::tuples::null_type, boost::tuples::null_type, boost::tuples::null_type, boost::tuples::null_type, boost::tuples::null_type, boost::tuples::null_type, boost::tuples::null_type, boost::tuples::null_type> >, std::__1::less<mira::multi_index::composite_key_result<mira::multi_index::composite_key<steem::chain::account_object, mira::multi_index::member<steem::chain::account_object, fc::time_point_sec, &(steem::chain::account_object::next_vesting_withdrawal)>, mira::multi_index::member<steem::chain::account_object, steem::protocol::fixed_string_impl<fc::uint128>, &(steem::chain::account_object::name)>, boost::tuples::null_type, boost::tuples::null_type, boost::tuples::null_type, boost::tuples::null_type, boost::tuples::null_type, boost::tuples::null_type, boost::tuples::null_type, boost::tuples::null_type> > > > const*
max_sequential_skip_in_iterations=8
bottommost_compression=kDisableCompressionOption
prefix_extractor=nullptr
memtable_insert_with_hint_prefix_extractor=nullptr
memtable_factory=SkipListFactory
compaction_filter=nullptr
compaction_options_universal={allow_trivial_move=false;stop_style=kCompactionStopStyleTotalSize;min_merge_width=2;compression_size_percent=-1;max_size_amplification_percent=200;max_merge_width=4294967295;size_ratio=1;}
merge_operator=nullptr
compaction_style=kCompactionStyleLevel
[TableOptions/BlockBasedTable "boost\:\:mpl\:\:v_item<steem\:\:chain\:\:by_next_vesting_withdrawal, boost\:\:mpl\:\:vector0<mpl_\:\:na>, 0>"]
format_version=2
whole_key_filtering=true
verify_compression=false
partition_filters=false
enable_index_compression=true
checksum=kCRC32c
index_block_restart_interval=1
pin_top_level_index_and_filter=true
block_align=false
block_size=4096
index_type=kBinarySearch
filter_policy=nullptr
metadata_block_size=4096
no_block_cache=false
block_size_deviation=10
data_block_index_type=kDataBlockBinarySearch
data_block_hash_table_util_ratio=0.750000
read_amp_bytes_per_bit=8589934592
cache_index_and_filter_blocks=false
block_restart_interval=16
pin_l0_filter_and_index_blocks_in_cache=false
hash_index_allow_collision=true
cache_index_and_filter_blocks_with_high_priority=false
flush_block_policy_factory=FlushBlockBySizePolicyFactory
The iostat
of the run looks like this:
5:14 $ iostat 30
disk0 cpu load average
KB/t tps MB/s us sy id 1m 5m 15m
11.03 185 2.00 10 3 87 2.81 4.21 3.41
27.45 64 1.72 6 8 86 2.12 3.90 3.33
63.93 24 1.53 6 8 86 1.95 3.69 3.27
44.70 21 0.92 6 8 86 2.06 3.54 3.23
14.46 84 1.19 6 8 86 1.96 3.37 3.18
29.19 36 1.02 6 8 86 1.96 3.24 3.14
44.97 28 1.25 6 8 86 1.88 3.09 3.09
16.01 76 1.19 6 8 86 1.65 2.92 3.02
39.67 23 0.88 6 8 86 1.62 2.81 2.98
40.27 22 0.87 6 8 86 2.08 2.80 2.97
34.95 20 0.67 6 8 87 1.93 2.69 2.92
41.84 35 1.45 8 8 84 2.43 2.74 2.93
29.08 51 1.46 6 8 86 2.66 2.76 2.93
36.97 32 1.15 6 8 86 2.11 2.62 2.87
40.21 20 0.79 6 8 86 2.19 2.59 2.85
31.53 25 0.78 6 8 86 2.46 2.61 2.85
41.65 17 0.68 6 8 87 2.10 2.52 2.81
43.97 26 1.14 6 8 86 1.94 2.45 2.77
37.64 30 1.12 6 8 86 1.81 2.37 2.73
11.76 113 1.29 9 9 82 1.81 2.32 2.70
disk0 cpu load average
KB/t tps MB/s us sy id 1m 5m 15m
29.20 30 0.86 6 8 86 1.67 2.24 2.65
18.90 47 0.87 6 8 86 1.47 2.13 2.60
51.24 23 1.15 6 8 86 1.73 2.13 2.58
47.54 30 1.41 6 8 86 1.92 2.14 2.57
27.02 36 0.96 6 8 86 1.72 2.07 2.53
31.07 31 0.95 6 8 86 1.66 2.02 2.49
35.40 24 0.83 6 8 86 1.79 2.01 2.48
38.23 18 0.67 6 8 86 1.94 2.03 2.47
36.63 26 0.93 6 8 86 1.88 2.01 2.44
72.41 16 1.16 6 8 86 1.87 1.99 2.42
78.75 12 0.91 6 8 86 2.02 2.01 2.41
46.11 19 0.86 6 8 86 2.02 2.02 2.40
22.92 40 0.90 7 8 85 1.99 2.00 2.38
54.48 16 0.84 6 8 86 1.74 1.94 2.34
37.90 28 1.05 6 8 86 1.90 1.96 2.33
26.85 76 2.00 7 8 85 1.86 1.94 2.32
19.26 81 1.53 6 8 86 1.77 1.91 2.29
43.15 19 0.81 6 8 86 1.90 1.93 2.29
31.28 26 0.80 6 8 86 1.86 1.92 2.27
35.78 35 1.21 7 8 85 2.12 1.98 2.28
disk0 cpu load average
KB/t tps MB/s us sy id 1m 5m 15m
51.52 29 1.44 6 8 86 2.14 1.99 2.27
49.08 23 1.12 6 8 86 2.40 2.07 2.29
38.73 25 0.96 6 8 86 2.35 2.09 2.29
28.98 31 0.87 7 8 85 2.43 2.13 2.29
31.63 29 0.91 10 10 81 2.43 2.16 2.30
62.04 19 1.13 12 9 79 2.70 2.26 2.33
49.89 26 1.25 7 8 84 2.74 2.32 2.35
55.20 22 1.17 6 8 86 2.62 2.33 2.35
44.99 21 0.92 6 8 86 2.28 2.27 2.33
31.31 29 0.90 6 8 86 1.98 2.19 2.30
45.46 15 0.67 6 8 86 2.29 2.24 2.31
36.21 35 1.23 6 8 86 2.01 2.18 2.28
56.98 24 1.34 6 8 86 2.45 2.27 2.31
36.71 52 1.86 10 9 81 10.83 4.41 3.09
37.39 35 1.27 7 8 85 7.34 4.18 3.05
36.37 33 1.16 7 8 85 5.27 3.97 3.01
26.33 40 1.02 10 8 81 4.34 3.87 3.01
36.68 23 0.83 9 8 83 4.36 3.93 3.06
54.02 25 1.34 8 8 84 3.54 3.77 3.03
48.15 22 1.01 6 8 86 2.87 3.59 2.99
disk0 cpu load average
KB/t tps MB/s us sy id 1m 5m 15m
43.97 20 0.88 6 8 86 2.58 3.46 2.96
77.58 14 1.06 6 8 86 2.15 3.27 2.91
36.08 21 0.73 6 8 86 2.21 3.18 2.89
40.75 22 0.86 7 8 85 2.30 3.11 2.87
40.29 33 1.30 14 10 76 2.23 3.02 2.85
40.92 26 1.04 8 8 84 2.31 2.97 2.83
31.50 31 0.95 7 8 84 2.29 2.90 2.81
37.62 25 0.90 7 8 85 2.14 2.80 2.78
31.09 26 0.79 7 8 85 2.08 2.72 2.75
32.22 23 0.72 7 8 85 1.85 2.60 2.71
40.29 30 1.19 6 8 86 1.86 2.53 2.67
41.81 35 1.43 7 8 84 2.12 2.54 2.67
47.47 30 1.39 7 8 84 2.18 2.52 2.66
37.46 22 0.82 7 8 85 2.50 2.57 2.67
24.91 35 0.85 7 8 85 2.49 2.57 2.66
34.53 119 4.02 7 9 84 2.47 2.56 2.66
25.26 38 0.93 7 8 85 2.43 2.53 2.65
The DB statistics dump is as follows:
rocksdb.block.cache.miss COUNT : 41039
rocksdb.block.cache.hit COUNT : 25743973
rocksdb.block.cache.add COUNT : 14649
rocksdb.block.cache.add.failures COUNT : 0
rocksdb.block.cache.index.miss COUNT : 0
rocksdb.block.cache.index.hit COUNT : 0
rocksdb.block.cache.index.add COUNT : 0
rocksdb.block.cache.index.bytes.insert COUNT : 0
rocksdb.block.cache.index.bytes.evict COUNT : 0
rocksdb.block.cache.filter.miss COUNT : 0
rocksdb.block.cache.filter.hit COUNT : 0
rocksdb.block.cache.filter.add COUNT : 0
rocksdb.block.cache.filter.bytes.insert COUNT : 0
rocksdb.block.cache.filter.bytes.evict COUNT : 0
rocksdb.block.cache.data.miss COUNT : 41039
rocksdb.block.cache.data.hit COUNT : 25743973
rocksdb.block.cache.data.add COUNT : 14649
rocksdb.block.cache.data.bytes.insert COUNT : 56293044
rocksdb.block.cache.bytes.read COUNT : 37899054421
rocksdb.block.cache.bytes.write COUNT : 56293044
rocksdb.bloom.filter.useful COUNT : 0
rocksdb.bloom.filter.full.positive COUNT : 0
rocksdb.bloom.filter.full.true.positive COUNT : 0
rocksdb.persistent.cache.hit COUNT : 0
rocksdb.persistent.cache.miss COUNT : 0
rocksdb.sim.block.cache.hit COUNT : 0
rocksdb.sim.block.cache.miss COUNT : 0
rocksdb.memtable.hit COUNT : 103
rocksdb.memtable.miss COUNT : 197546
rocksdb.l0.hit COUNT : 72
rocksdb.l1.hit COUNT : 0
rocksdb.l2andup.hit COUNT : 0
rocksdb.compaction.key.drop.new COUNT : 573996
rocksdb.compaction.key.drop.obsolete COUNT : 90103
rocksdb.compaction.key.drop.range_del COUNT : 0
rocksdb.compaction.key.drop.user COUNT : 0
rocksdb.compaction.range_del.drop.obsolete COUNT : 0
rocksdb.compaction.optimized.del.drop.obsolete COUNT : 0
rocksdb.compaction.cancelled COUNT : 0
rocksdb.number.keys.written COUNT : 5658204
rocksdb.number.keys.read COUNT : 197649
rocksdb.number.keys.updated COUNT : 0
rocksdb.bytes.written COUNT : 1852349515
rocksdb.bytes.read COUNT : 576
rocksdb.number.db.seek COUNT : 17445996
rocksdb.number.db.next COUNT : 491811
rocksdb.number.db.prev COUNT : 0
rocksdb.number.db.seek.found COUNT : 11443646
rocksdb.number.db.next.found COUNT : 491759
rocksdb.number.db.prev.found COUNT : 0
rocksdb.db.iter.bytes.read COUNT : 3535551376
rocksdb.no.file.closes COUNT : 0
rocksdb.no.file.opens COUNT : 2692
rocksdb.no.file.errors COUNT : 0
rocksdb.l0.slowdown.micros COUNT : 0
rocksdb.memtable.compaction.micros COUNT : 0
rocksdb.l0.num.files.stall.micros COUNT : 0
rocksdb.stall.micros COUNT : 0
rocksdb.db.mutex.wait.micros COUNT : 0
rocksdb.rate.limit.delay.millis COUNT : 0
rocksdb.num.iterators COUNT : 0
rocksdb.number.multiget.get COUNT : 0
rocksdb.number.multiget.keys.read COUNT : 0
rocksdb.number.multiget.bytes.read COUNT : 0
rocksdb.number.deletes.filtered COUNT : 0
rocksdb.number.merge.failures COUNT : 0
rocksdb.bloom.filter.prefix.checked COUNT : 0
rocksdb.bloom.filter.prefix.useful COUNT : 0
rocksdb.number.reseeks.iteration COUNT : 46
rocksdb.getupdatessince.calls COUNT : 0
rocksdb.block.cachecompressed.miss COUNT : 0
rocksdb.block.cachecompressed.hit COUNT : 0
rocksdb.block.cachecompressed.add COUNT : 0
rocksdb.block.cachecompressed.add.failures COUNT : 0
rocksdb.wal.synced COUNT : 0
rocksdb.wal.bytes COUNT : 1852349515
rocksdb.write.self COUNT : 5433359
rocksdb.write.other COUNT : 0
rocksdb.write.timeout COUNT : 0
rocksdb.write.wal COUNT : 10866718
rocksdb.compact.read.bytes COUNT : 32188300
rocksdb.compact.write.bytes COUNT : 36010174
rocksdb.flush.write.bytes COUNT : 28697420
rocksdb.number.direct.load.table.properties COUNT : 0
rocksdb.number.superversion_acquires COUNT : 2403
rocksdb.number.superversion_releases COUNT : 4
rocksdb.number.superversion_cleanups COUNT : 4
rocksdb.number.block.compressed COUNT : 31234
rocksdb.number.block.decompressed COUNT : 30189
rocksdb.number.block.not_compressed COUNT : 0
rocksdb.merge.operation.time.nanos COUNT : 0
rocksdb.filter.operation.time.nanos COUNT : 0
rocksdb.row.cache.hit COUNT : 0
rocksdb.row.cache.miss COUNT : 0
rocksdb.read.amp.estimate.useful.bytes COUNT : 0
rocksdb.read.amp.total.read.bytes COUNT : 0
rocksdb.number.rate_limiter.drains COUNT : 0
rocksdb.number.iter.skip COUNT : 9209598
rocksdb.blobdb.num.put COUNT : 0
rocksdb.blobdb.num.write COUNT : 0
rocksdb.blobdb.num.get COUNT : 0
rocksdb.blobdb.num.multiget COUNT : 0
rocksdb.blobdb.num.seek COUNT : 0
rocksdb.blobdb.num.next COUNT : 0
rocksdb.blobdb.num.prev COUNT : 0
rocksdb.blobdb.num.keys.written COUNT : 0
rocksdb.blobdb.num.keys.read COUNT : 0
rocksdb.blobdb.bytes.written COUNT : 0
rocksdb.blobdb.bytes.read COUNT : 0
rocksdb.blobdb.write.inlined COUNT : 0
rocksdb.blobdb.write.inlined.ttl COUNT : 0
rocksdb.blobdb.write.blob COUNT : 0
rocksdb.blobdb.write.blob.ttl COUNT : 0
rocksdb.blobdb.blob.file.bytes.written COUNT : 0
rocksdb.blobdb.blob.file.bytes.read COUNT : 0
rocksdb.blobdb.blob.file.synced COUNT : 0
rocksdb.blobdb.blob.index.expired.count COUNT : 0
rocksdb.blobdb.blob.index.expired.size COUNT : 0
rocksdb.blobdb.blob.index.evicted.count COUNT : 0
rocksdb.blobdb.blob.index.evicted.size COUNT : 0
rocksdb.blobdb.gc.num.files COUNT : 0
rocksdb.blobdb.gc.num.new.files COUNT : 0
rocksdb.blobdb.gc.failures COUNT : 0
rocksdb.blobdb.gc.num.keys.overwritten COUNT : 0
rocksdb.blobdb.gc.num.keys.expired COUNT : 0
rocksdb.blobdb.gc.num.keys.relocated COUNT : 0
rocksdb.blobdb.gc.bytes.overwritten COUNT : 0
rocksdb.blobdb.gc.bytes.expired COUNT : 0
rocksdb.blobdb.gc.bytes.relocated COUNT : 0
rocksdb.blobdb.fifo.num.files.evicted COUNT : 0
rocksdb.blobdb.fifo.num.keys.evicted COUNT : 0
rocksdb.blobdb.fifo.bytes.evicted COUNT : 0
rocksdb.txn.overhead.mutex.prepare COUNT : 0
rocksdb.txn.overhead.mutex.old.commit.map COUNT : 0
rocksdb.txn.overhead.duplicate.key COUNT : 0
rocksdb.txn.overhead.mutex.snapshot COUNT : 0
rocksdb.number.multiget.keys.found COUNT : 0
rocksdb.db.get.micros P50 : 33.986355 P95 : 141.925679 P99 : 168.492893 P100 : 471.000000 COUNT : 197649 SUM : 11604336
rocksdb.db.write.micros P50 : 93.719573 P95 : 109.667189 P99 : 162.118001 P100 : 41259.000000 COUNT : 5433359 SUM : 460807078
rocksdb.compaction.times.micros P50 : 717.178423 P95 : 1801.428571 P99 : 28728.333333 P100 : 29357.000000 COUNT : 466 SUM : 663898
rocksdb.subcompaction.setup.times.micros P50 : 0.000000 P95 : 0.000000 P99 : 0.000000 P100 : 0.000000 COUNT : 0 SUM : 0
rocksdb.table.sync.micros P50 : 262.180180 P95 : 379.496396 P99 : 541.409524 P100 : 803.000000 COUNT : 2226 SUM : 574859
rocksdb.compaction.outfile.sync.micros P50 : 211.748634 P95 : 365.666667 P99 : 1145.861538 P100 : 1228.000000 COUNT : 466 SUM : 107895
rocksdb.wal.file.sync.micros P50 : 0.000000 P95 : 0.000000 P99 : 0.000000 P100 : 0.000000 COUNT : 0 SUM : 0
rocksdb.manifest.file.sync.micros P50 : 196.200194 P95 : 358.348865 P99 : 377.533572 P100 : 458.000000 COUNT : 3088 SUM : 643963
rocksdb.table.open.io.micros P50 : 38.232462 P95 : 67.864407 P99 : 135.389474 P100 : 320.000000 COUNT : 2692 SUM : 110195
rocksdb.db.multiget.micros P50 : 0.000000 P95 : 0.000000 P99 : 0.000000 P100 : 0.000000 COUNT : 0 SUM : 0
rocksdb.read.block.compaction.micros P50 : 0.000000 P95 : 0.000000 P99 : 0.000000 P100 : 0.000000 COUNT : 0 SUM : 0
rocksdb.read.block.get.micros P50 : 27.616458 P95 : 49.246669 P99 : 67.087727 P100 : 715.000000 COUNT : 41039 SUM : 1021421
rocksdb.write.raw.block.micros P50 : 0.574314 P95 : 2.013621 P99 : 9.188148 P100 : 1047.000000 COUNT : 37644 SUM : 55849
rocksdb.l0.slowdown.count P50 : 0.000000 P95 : 0.000000 P99 : 0.000000 P100 : 0.000000 COUNT : 0 SUM : 0
rocksdb.memtable.compaction.count P50 : 0.000000 P95 : 0.000000 P99 : 0.000000 P100 : 0.000000 COUNT : 0 SUM : 0
rocksdb.num.files.stall.count P50 : 0.000000 P95 : 0.000000 P99 : 0.000000 P100 : 0.000000 COUNT : 0 SUM : 0
rocksdb.hard.rate.limit.delay.count P50 : 0.000000 P95 : 0.000000 P99 : 0.000000 P100 : 0.000000 COUNT : 0 SUM : 0
rocksdb.soft.rate.limit.delay.count P50 : 0.000000 P95 : 0.000000 P99 : 0.000000 P100 : 0.000000 COUNT : 0 SUM : 0
rocksdb.numfiles.in.singlecompaction P50 : 3.149336 P95 : 3.914934 P99 : 3.982987 P100 : 4.000000 COUNT : 769 SUM : 2141
rocksdb.db.seek.micros P50 : 132.277891 P95 : 236.921757 P99 : 292.886370 P100 : 4392.000000 COUNT : 11445478 SUM : 1541520122
rocksdb.db.write.stall P50 : 0.000000 P95 : 0.000000 P99 : 0.000000 P100 : 0.000000 COUNT : 0 SUM : 0
rocksdb.sst.read.micros P50 : 6.210812 P95 : 9.682873 P99 : 9.991500 P100 : 118.000000 COUNT : 51807 SUM : 220440
rocksdb.num.subcompactions.scheduled P50 : 0.000000 P95 : 0.000000 P99 : 0.000000 P100 : 0.000000 COUNT : 0 SUM : 0
rocksdb.bytes.per.read P50 : 0.500182 P95 : 0.950346 P99 : 0.990361 P100 : 8.000000 COUNT : 197649 SUM : 576
rocksdb.bytes.per.write P50 : 317.056439 P95 : 503.002409 P99 : 565.006971 P100 : 1507.000000 COUNT : 5433359 SUM : 1852349515
rocksdb.bytes.per.multiget P50 : 0.000000 P95 : 0.000000 P99 : 0.000000 P100 : 0.000000 COUNT : 0 SUM : 0
rocksdb.bytes.compressed P50 : 3550.034512 P95 : 4315.935264 P99 : 4384.015330 P100 : 23237.000000 COUNT : 31234 SUM : 113465756
rocksdb.bytes.decompressed P50 : 3552.458678 P95 : 4316.207226 P99 : 4384.095986 P100 : 23237.000000 COUNT : 30189 SUM : 109857823
rocksdb.compression.times.nanos P50 : 0.000000 P95 : 0.000000 P99 : 0.000000 P100 : 0.000000 COUNT : 0 SUM : 0
rocksdb.decompression.times.nanos P50 : 0.000000 P95 : 0.000000 P99 : 0.000000 P100 : 0.000000 COUNT : 0 SUM : 0
rocksdb.read.num.merge_operands P50 : 0.000000 P95 : 0.000000 P99 : 0.000000 P100 : 0.000000 COUNT : 0 SUM : 0
rocksdb.blobdb.key.size P50 : 0.000000 P95 : 0.000000 P99 : 0.000000 P100 : 0.000000 COUNT : 0 SUM : 0
rocksdb.blobdb.value.size P50 : 0.000000 P95 : 0.000000 P99 : 0.000000 P100 : 0.000000 COUNT : 0 SUM : 0
rocksdb.blobdb.write.micros P50 : 0.000000 P95 : 0.000000 P99 : 0.000000 P100 : 0.000000 COUNT : 0 SUM : 0
rocksdb.blobdb.get.micros P50 : 0.000000 P95 : 0.000000 P99 : 0.000000 P100 : 0.000000 COUNT : 0 SUM : 0
rocksdb.blobdb.multiget.micros P50 : 0.000000 P95 : 0.000000 P99 : 0.000000 P100 : 0.000000 COUNT : 0 SUM : 0
rocksdb.blobdb.seek.micros P50 : 0.000000 P95 : 0.000000 P99 : 0.000000 P100 : 0.000000 COUNT : 0 SUM : 0
rocksdb.blobdb.next.micros P50 : 0.000000 P95 : 0.000000 P99 : 0.000000 P100 : 0.000000 COUNT : 0 SUM : 0
rocksdb.blobdb.prev.micros P50 : 0.000000 P95 : 0.000000 P99 : 0.000000 P100 : 0.000000 COUNT : 0 SUM : 0
rocksdb.blobdb.blob.file.write.micros P50 : 0.000000 P95 : 0.000000 P99 : 0.000000 P100 : 0.000000 COUNT : 0 SUM : 0
rocksdb.blobdb.blob.file.read.micros P50 : 0.000000 P95 : 0.000000 P99 : 0.000000 P100 : 0.000000 COUNT : 0 SUM : 0
rocksdb.blobdb.blob.file.sync.micros P50 : 0.000000 P95 : 0.000000 P99 : 0.000000 P100 : 0.000000 COUNT : 0 SUM : 0
rocksdb.blobdb.gc.micros P50 : 0.000000 P95 : 0.000000 P99 : 0.000000 P100 : 0.000000 COUNT : 0 SUM : 0
rocksdb.blobdb.compression.micros P50 : 0.000000 P95 : 0.000000 P99 : 0.000000 P100 : 0.000000 COUNT : 0 SUM : 0
rocksdb.blobdb.decompression.micros P50 : 0.000000 P95 : 0.000000 P99 : 0.000000 P100 : 0.000000 COUNT : 0 SUM : 0
rocksdb.db.flush.micros P50 : 615.480283 P95 : 2258.695652 P99 : 5642.083333 P100 : 12961.000000 COUNT : 2190 SUM : 1776866
The performance context looks like this:
3504961ms chain_plugin.cpp:54 print ] seek_internal_seek_time = 1444019131278
3504961ms chain_plugin.cpp:54 print ] seek_child_seek_time = 886064929659
3504961ms chain_plugin.cpp:54 print ] block_seek_nanos = 193897643954
3504961ms chain_plugin.cpp:54 print ] write_wal_time = 183087924036
3504961ms chain_plugin.cpp:54 print ] seek_min_heap_time = 166805290539
3504961ms chain_plugin.cpp:54 print ] write_pre_and_post_process_time = 129182084292
3504961ms chain_plugin.cpp:54 print ] new_table_iterator_nanos = 112597122008
3504961ms chain_plugin.cpp:54 print ] new_table_block_iter_nanos = 111169581617
3504961ms chain_plugin.cpp:54 print ] seek_on_memtable_time = 48624293410
3504961ms chain_plugin.cpp:54 print ] find_next_user_entry_time = 46519947537
3504961ms chain_plugin.cpp:54 print ] write_memtable_time = 27617621463
3504961ms chain_plugin.cpp:54 print ] write_scheduling_flushes_compactions_time = 20428029104
3504961ms chain_plugin.cpp:54 print ] db_mutex_lock_nanos = 20383042189
3504961ms chain_plugin.cpp:54 print ] db_condition_wait_nanos = 9059632274
3504961ms chain_plugin.cpp:54 print ] get_from_output_files_time = 6048413301
3504961ms chain_plugin.cpp:54 print ] iter_read_bytes = 3535551376
3504961ms chain_plugin.cpp:54 print ] get_from_memtable_time = 869118071
3504961ms chain_plugin.cpp:54 print ] get_post_process_time = 823704917
3504961ms chain_plugin.cpp:54 print ] get_snapshot_time = 803259405
3504961ms chain_plugin.cpp:54 print ] user_key_comparison_count = 383236575
3504961ms chain_plugin.cpp:54 print ] block_read_time = 337752723
3504961ms chain_plugin.cpp:54 print ] block_decompress_time = 157082435
3504961ms chain_plugin.cpp:54 print ] block_checksum_time = 135537361
3504961ms chain_plugin.cpp:54 print ] seek_child_seek_count = 36704276
3504961ms chain_plugin.cpp:54 print ] block_read_byte = 35672856
3504961ms chain_plugin.cpp:54 print ] block_cache_hit_count = 25729990
3504961ms chain_plugin.cpp:54 print ] seek_on_memtable_count = 11445524
3504961ms chain_plugin.cpp:54 print ] find_table_nanos = 7506530
3504961ms chain_plugin.cpp:54 print ] internal_key_skipped_count = 5077437
3504961ms chain_plugin.cpp:54 print ] internal_delete_skipped_count = 4623972
3504961ms chain_plugin.cpp:54 print ] next_on_memtable_count = 1230803
3504961ms chain_plugin.cpp:54 print ] get_from_memtable_count = 196625
3504961ms chain_plugin.cpp:54 print ] block_read_count = 27276
3504961ms chain_plugin.cpp:54 print ] get_read_bytes = 576
Attached is the runs LOG.
@mvandeberg Michael, have you came up with the solution for this problem?
I am developing a drop in replacement for Boost multi-index containers that instead uses rocksdb as the backing data store. My implementation is correct and yields correct results for our existing workload and tests, but I am observing significant performance degradations at scale. Most of the runtime of the application is being spent in
DBIter::SeekToFirst
andDBIter::Next
.My application frequently uses one index as a queue. My theory is that because these keys change frequently, I am experiencing a higher number of iter skips than would normally be expected.
Below are statistics from a slice of a run.
rocksdb.number.iter.skip COUNT : 4592785156
seem to me to be the most concerning metric.I noticed in #3760 a small discussion which seems to be related.
Changing iteration results is not an option for my application.
The automated tuning tool has not yielded any results for me (the output is always zero).
Expected behavior
Because keys are sorted, I expect that common iteration tasks, such as querying the first value and iterating to the next would be done in constant time.
Actual behavior
While writes are still in the memtable, constant time behavior is not preserved.
Steps to reproduce the behavior