Closed huor closed 3 years ago
you can either do manual flush by calling Flush()
or close the db by calling Close()
See https://github.com/facebook/rocksdb/blob/master/include/rocksdb/db.h for all DB related APIs
You also mentioned even with 1G written there is still no flush happening, which is unexpected because with default write_buffer_size at 64MB the automatic flush should be triggered well before that. Do you mind posting your OPTIONS file? We can check if there is any configuration that caused this.
Thanks @miasantreble for the suggestion!
Say we have three solutions here:
I did some investigation with about 600M raw data, which is about 300M sst file after written to rocksdb with my compression setting. Here are my findings:
Would you please shed some light on how to ensure data is persistent to sst as well as having best performance gain? Thanks in advance.
The rocksdb is configure with write_buffer_size=67108864, and db_write_buffer_size=0 in my environment. Please find below detail OPTIONS for your reference.
$ cat /tmp/data/OPTIONS-000005
# This is a RocksDB option file.
#
# For detailed file format spec, please refer to the example file
# in examples/rocksdb_option_file_example.ini
#
[Version]
rocksdb_version=5.10.3
options_file_version=1.1
[DBOptions]
allow_mmap_writes=false
base_background_compactions=-1
new_table_reader_for_compaction_inputs=false
db_log_dir=
wal_recovery_mode=kPointInTimeRecovery
use_direct_reads=false
write_thread_max_yield_usec=100
max_manifest_file_size=18446744073709551615
allow_2pc=false
allow_fallocate=true
fail_if_options_file_error=false
allow_ingest_behind=false
allow_mmap_reads=false
skip_log_error_on_recovery=false
recycle_log_file_num=0
delete_obsolete_files_period_micros=21600000000
compaction_readahead_size=0
use_direct_io_for_flush_and_compaction=false
log_file_time_to_roll=0
create_missing_column_families=false
advise_random_on_open=true
max_log_file_size=0
stats_dump_period_sec=600
enable_thread_tracking=false
use_adaptive_mutex=false
create_if_missing=true
is_fd_close_on_exec=true
max_background_flushes=-1
manifest_preallocation_size=4194304
error_if_exists=false
skip_stats_update_on_db_open=false
max_open_files=-1
random_access_max_buffer_size=1048576
use_fsync=false
max_background_jobs=16
two_write_queues=false
max_background_compactions=-1
max_file_opening_threads=16
table_cache_numshardbits=6
keep_log_file_num=1000
avoid_flush_during_shutdown=false
db_write_buffer_size=0
max_total_wal_size=0
wal_dir=/tmp/data
max_subcompactions=1
WAL_size_limit_MB=0
paranoid_checks=true
allow_concurrent_memtable_write=true
writable_file_max_buffer_size=1048576
WAL_ttl_seconds=0
delayed_write_rate=16777216
bytes_per_sync=0
wal_bytes_per_sync=0
enable_pipelined_write=false
enable_write_thread_adaptive_yield=true
write_thread_slow_yield_usec=3
access_hint_on_compaction_start=NORMAL
info_log_level=INFO_LEVEL
dump_malloc_stats=false
avoid_flush_during_recovery=false
preserve_deletes=false
manual_wal_flush=false
[CFOptions "default"]
report_bg_io_stats=false
inplace_update_support=false
max_compaction_bytes=1677721600
disable_auto_compactions=false
write_buffer_size=67108864
bloom_locality=0
max_bytes_for_level_multiplier=10.000000
compaction_filter_factory=nullptr
optimize_filters_for_hits=false
target_file_size_base=67108864
max_write_buffer_number_to_maintain=0
hard_pending_compaction_bytes_limit=274877906944
paranoid_file_checks=false
memtable_prefix_bloom_size_ratio=0.000000
force_consistency_checks=false
max_write_buffer_number=2
max_bytes_for_level_multiplier_additional=1:1:1:1:1:1:1
level0_slowdown_writes_trigger=20
level_compaction_dynamic_level_bytes=false
compaction_options_fifo={allow_compaction=false;ttl=0;max_table_files_size=1073741824;}
inplace_update_num_locks=10000
level0_file_num_compaction_trigger=4
compression=kSnappyCompression
level0_stop_writes_trigger=36
num_levels=7
table_factory=BlockBasedTable
compression_per_level=kLZ4Compression:kLZ4Compression:kLZ4Compression:kLZ4Compression:kLZ4Compression:kLZ4Compression:kLZ4Compression
target_file_size_multiplier=1
min_write_buffer_number_to_merge=1
arena_block_size=8388608
max_successive_merges=0
memtable_huge_page_size=0
compaction_pri=kByCompensatedSize
soft_pending_compaction_bytes_limit=68719476736
max_bytes_for_level_base=268435456
comparator=leveldb.BytewiseComparator
max_sequential_skip_in_iterations=8
bottommost_compression=kDisableCompressionOption
prefix_extractor=nullptr
memtable_insert_with_hint_prefix_extractor=nullptr
memtable_factory=SkipListFactory
compaction_filter=nullptr
compaction_options_universal={allow_trivial_move=false;stop_style=kCompactionStopStyleTotalSize;min_merge_width=2;compression_size_percent=-1;max_size_amplification_percent=200;max_merge_width=4294967295;size_ratio=1;}
merge_operator=nullptr
compaction_style=kCompactionStyleLevel
[TableOptions/BlockBasedTable "default"]
format_version=2
whole_key_filtering=true
verify_compression=false
partition_filters=false
index_block_restart_interval=1
block_size_deviation=10
block_size=32768
pin_l0_filter_and_index_blocks_in_cache=false
block_restart_interval=16
filter_policy=nullptr
metadata_block_size=4096
no_block_cache=false
checksum=kCRC32c
read_amp_bytes_per_bit=8589934592
cache_index_and_filter_blocks=false
index_type=kBinarySearch
hash_index_allow_collision=true
cache_index_and_filter_blocks_with_high_priority=false
flush_block_policy_factory=FlushBlockBySizePolicyFactory
I have the same issue. When calling db->Put(), no SST are generated in the folder. I can only list LOG, MANIFEST, CURRENT, IDENTITY, LOCK, OPTIONS.
It only happened in the Linux VM (Ubuntu 18.04, using Multipass) on my MAC. When I ran the application directly in MAC OS 10.15, I saw the SST files.
I tried with and without WAL, sync and no sync in each Put().
I have the same issue. When calling db->Put(), no SST are generated in the folder. I can only list LOG, MANIFEST, CURRENT, IDENTITY, LOCK, OPTIONS.
It only happened in the Linux VM (Ubuntu 18.04, using Multipass) on my MAC. When I ran the application directly in MAC OS 10.15, I saw the SST files.
I tried with and without WAL, sync and no sync with each Put(). The result were same in VM and MAC OS.
What is the value of avoid_flush_during_shutdown
in your test?
If you reopen the db, can you find all the data previously written?
@riversand963 Thank you for the quick reply.
I tried again.
As avoid_flush_during_shutdown is not explicitly set in my application, the default value should be false.
I tried with WAL enabled and disabled (write option of sync is true for each put). All results are same, only LOG files (Every log record is 0.0 for write and compaction), no SST files.
There are two test situations. One is the Ubuntu VM in Mac, the other is the docker in the Ubuntu VM. The results are same as above. Ubuntu VM is supported by Multipass.
Only when I run my application in MAC OS, I can see the SST files instantly. I do not have pure Ubuntu, so I can not tell what will happen for that.
BTW:
If I run my application for a long time, which keeps inserting new keys to Rocksdb, it will be OOM after a couple of minutes in VM environment and I can hear some sound like HDD spinning (but my MAC does not have HDD, only SSD).
Even I safely close the database by deleting the database object, no SST at all.
@riversand963 I found it is something related to Jemalloc. When I changed the memory allocation library from Jemalloc to libc, the .sst file started to show up in the Rocksdb folder for Linux VM in MacOS. Because MacOS does not use Jemalloc in my Makefile, it did not happen in MacOS as I described above.
@szstonelee Can you provide more details on why Jemalloc will trigger this issue?
@riversna963 I do not know why. I only know that in the Multipass VM of Ubuntu running in my Host OS of MacOS, when my code in RedRock (https://github.com/szstonelee/RedRock) calling RocksDB API with Jemalloc, it does not generate any SST files. But if nothing changed but only switching to libc (by modifying src/makefile),the SST files show up. It may be some issues relating to Multipass, or MacOS or Jemalloc or RocksDB, but I do not know which one is the cause so I report the bug here.
@szstonelee I see. Do you plan to continue investigation and share more info?
Sorry right now I have no time for investigation for the issue. In future, If I find something new or fix the bug, I will let you know.
There are a number of variables here, and depending on the current available information, I do not have a good theory about the cause. Since nobody is actively investigating, I'll close this for now. Feel free to reopen if it still affects you.
Thanks @miasantreble for the suggestion!
Say we have three solutions here:
- Solution 1: open rocksdb, write flush, read, close rocsdb
- Solution 2: open rocksdb, write without flush, close rocsdb, open rocksdb, read, close rocsdb
- Solution 3: open rocksdb, write with flush, read, close rocsdb
HI, is this problem resolved? The same thing happened in my environment, I exceeded write_buffer_size when I used it, but still no SST file was generated
Expected behavior
If it follows the step to open rocksdb, put data, get data, close rocksdb, there should be ssl file generated. However, it does not. The rocksdb version is 5.10.3.
If it follows the step to open rocksdb, put data, close rocksdb, then open rocksdb again, get data, close rocksdb, there is ssl file generated as expected. But it might take some long time to recover when open rocksdb for get data, especially if the data volume put to rocksdb is relatively large.
Actual behavior
No ssl file generated after put data to rocksdb. Only log file is present.
However, if it follows the step to open rocksdb, put data, close rocksdb, open it again, get data, close rocksdb. The ssl file is generated as expected.
Steps to reproduce the behavior
Here is the c++ code for reproduction. There is no sst file even if the data volume put to rocksdb is much larger, i.e, about 1G.
The sst file is generated if it follows the step to open rocksdb, put data, close rocksdb, then open rocksdb again, get data, close rocksdb.