Open gaowayne opened 1 year ago
From your example above, uiLsize isn't 3T, it is 1024*3T
uint64_t uiL0size = 1024 * 1024 * 1024;
uiL0size = 512 * uiL0size; //512G
uint64_t uiLsize = 1024 * 1024 * 1024;
uiLsize = 3*1024 * 1024 * uiLsize; // 3T
I can't reproduce this after building from main.
rm -rf p1; rm -rf p2; mkdir p1; mkdir p2; ./db_bench --benchmarks=fillrandom,stats --num=1000000 --value_size=1000 --compression_type=none ; du -hs p1; du -hs p2 ; du -hs /tmp/rocksdbtest-2260
Output is:
628M p1
193M p2
40M /tmp/rocksdbtest-2260
diff --git a/tools/db_bench_tool.cc b/tools/db_bench_tool.cc
index 19ca1b4c0..2ac4c7842 100644
--- a/tools/db_bench_tool.cc
+++ b/tools/db_bench_tool.cc
@@ -4844,6 +4844,13 @@ class Benchmark {
FLAGS_secondary_update_interval, db));
}
} else {
+ std::cout << "Open here\n";
+ std::string p1 = "./p1";
+ std::string p2 = "./p2";
+ uint64_t s1 = 1024ULL * 1024 * 1024 * 1;
+ uint64_t s2 = 1024ULL * 1024 * 1024 * 500;
+ options.db_paths.push_back({ p1, s1 });
+ options.db_paths.push_back({ p2, s2 });
s = DB::Open(options, db_name, &db->db);
}
if (FLAGS_report_open_timing) {
stogram=1 --key_size=4096 --value_size=8192 --compression_type=none --benchmarks="fillrandom,stats" --statistics --stats_per_interval=1 --stats_interval_seconds=240 --threads=1 --target_file_size_multiplier=10 --write_buffer_size=134217728 --use_existing_db=0 --disable_wal=false --cache_size=536870912 --bloom_bits=10 --bloom_locality=1 --compaction_style=0 --universal_max_size_amplification_p
@mdcallag buddy, could you please try put p1 and p2 into two NVMe SSD? or if you do not have, you can create two partitions. and I see many unit test use small size, can you use my size to have a try? I am guessing RocksDB did not arrange levels correctly after size reach 3T level.
and also you mean latest code already fix this issue? here is the git last log I am testing with.
commit 760b773f58277f9ce449389c0773a1eee2d14363 (HEAD -> main, origin/main, origin/HEAD)
Author: Andrew Kryczka <andrewkr@fb.com>
Date: Mon Apr 10 13:59:44 2023 -0700
fix optimization-disabled test builds with platform010 (#11361)
Summary:
Fixed the following failure:
third-party/gtest-1.8.1/fused-src/gtest/gtest-all.cc: In function ‘bool testing::internal::StackGrowsDown()’:
third-party/gtest-1.8.1/fused-src/gtest/gtest-all.cc:8681:24: error: ‘dummy’ may be used uninitialized [-Werror=maybe-uninitialized]
8681 | StackLowerThanAddress(&dummy, &result);
| ~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~
third-party/gtest-1.8.1/fused-src/gtest/gtest-all.cc:8671:13: note: by argument 1 of type ‘const void*’ to ‘void testing::internal::StackLowerThanAddress(const void*, bool*)’ declared here
8671 | static void StackLowerThanAddress(const void* ptr, bool* result) {
| ^~~~~~~~~~~~~~~~~~~~~
third-party/gtest-1.8.1/fused-src/gtest/gtest-all.cc:8679:7: note: ‘dummy’ declared here
8679 | int dummy;
| ^~~~~
```
Pull Request resolved: https://github.com/facebook/rocksdb/pull/11361
Reviewed By: cbi42
Differential Revision: D44838033
Pulled By: ajkr
fbshipit-source-id: 27d68b5a24a15723bbaaa7de45ccd70a60fe259e
My tests used. Assuming there is a bug I doubt it has been fixed between the versions we used.
commit a5909f88641a1222865839e62c91e43e6ee36c03 (HEAD -> main, origin/main, origin/HEAD)
Author: Peter Dillinger <peterd@fb.com>
Date: Thu May 4 12:41:28 2023 -0700
Clarify io_activity (#11427)
I don't have a spare server with more than 1 SSD and I am not willing to create partitions on the single-SSD servers. What happens with your setup/test if you use two directories on the same partition?
Also, what happens if you fix the math so that uiLsize is really 3T. The current code shouldn't overflow on uint64_t but the value is much larger than 3T. I am suggesting that you remove one of the multiply by 1024 terms: uiLsize = 31024 1024 uiLsize; // 3T -> uiLsize = 31024 * uiLsize; // 3T
uint64_t uiL0size = 1024 * 1024 * 1024;
uiL0size = 512 * uiL0size; //512G
uint64_t uiLsize = 1024 * 1024 * 1024;
uiLsize = 3*1024 * 1024 * uiLsize; // 3T
options.db_paths.push_back({ kDBPath, (uint64_t)uiL0size }); ------------- 512G
options.db_paths.push_back({ kDBPath_1, uiLsize }); ----------------------- 3T
My tests used. Assuming there is a bug I doubt it has been fixed between the versions we used.
commit a5909f88641a1222865839e62c91e43e6ee36c03 (HEAD -> main, origin/main, origin/HEAD) Author: Peter Dillinger <peterd@fb.com> Date: Thu May 4 12:41:28 2023 -0700 Clarify io_activity (#11427)
I don't have a spare server with more than 1 SSD and I am not willing to create partitions on the single-SSD servers. What happens with your setup/test if you use two directories on the same partition?
I will take the AR to verify this from my side, and update you. :), I will also verify two folder on the same patitions cases and give update. :)
Also, what happens if you fix the math so that uiLsize is really 3T. The current code shouldn't overflow on uint64_t but the value is much larger than 3T. I am suggesting that you remove one of the multiply by 1024 terms: uiLsize = 3_1024 1024 uiLsize; // 3T -> uiLsize = 3_1024 * uiLsize; // 3T
uint64_t uiL0size = 1024 * 1024 * 1024; uiL0size = 512 * uiL0size; //512G uint64_t uiLsize = 1024 * 1024 * 1024; uiLsize = 3*1024 * 1024 * uiLsize; // 3T options.db_paths.push_back({ kDBPath, (uint64_t)uiL0size }); ------------- 512G options.db_paths.push_back({ kDBPath_1, uiLsize }); ----------------------- 3T
ah, Yes, good catch, my poor match, that is far more 3T. :) let me try this first. I will give update.
@mdcallag buddy you are right, this is my bad. my bad math. after I correct 3T issue, now two NVMe SSD shows iostat BW. 2nd NVMe is used correctly. but its BW is super slow. one NVMe SSD, we can reach 1200MB/s but with two, each only have 350MB/s
Device r/s rMB/s rrqm/s %rrqm r_await rareq-sz w/s wMB/s wrqm/s %wrqm w_await wareq-sz d/s dMB/s drqm/s %drqm d_await dareq-sz f/s f_await aqu-sz %util
nvme7n1p4 0.00 0.00 0.00 0.00 0.00 0.00 3149.00 357.74 0.00 0.00 10.27 116.33 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 32.34 11.44
nvme3n1 5507.40 343.86 0.00 0.00 0.06 63.93 3187.60 358.08 0.00 0.00 9.51 115.03 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 30.67 51.60
nvme6n1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
I will download latest code to double check this BW perf issue
./db_bench --num=60000000 --db=/mnt/nvme7n1p4/test1 --histogram=1 --key_size=4096 --value_size=8192 --compression_type=none --benchmarks="fillrandom,stats" --statistics --stats_per_interval=1 --stats_interval_seconds=240 --threads=1 --target_file_size_multiplier=10 --write_buffer_size=134217728 --use_existing_db=0 --disable_wal=false --cache_size=536870912 --bloom_bits=10 --bloom_locality=1 --compaction_style=0 --universal_max_size_amplification_percent=500 --max_write_buffer_number=16 --max_background_flushes=16 --level0_file_num_compaction_trigger=32 --level0_slowdown_writes_trigger=160 --level0_stop_writes_trigger=288 --soft_pending_compaction_bytes_limit=549755813888 --hard_pending_compaction_bytes_limit=1099511627776 --max_background_jobs=4 --max_background_compactions=4 --subcompactions=20
sorry man, stay tuned. I found I should test fillrandom to reproduce this. fillseq works fine always.
@mdcallag hello man, I confirmed, even I fixed 3T math problem and use latest code, when I fillrandom, 2nd drive is not used too!~
if I configure two NVMe SSD devices with db_paths, RocksDB should start use 2nd NVMe SSD after 1st one is full.
That is not quite the expected behavior. How much space does "/mnt/nvme3n1/test1/" consume when you get an out-of-space error? It looks like the expected size of the DB (FileSize: 468750.0 MB (estimated)
) is a bit less than the size limit you are giving to "/mnt/nvme3n1/test1" (512GB), in which case we would expect the DB to reside in that one directory.
[root@phobos rocksdb]# ./db_bench --num=60000000 --db=/mnt/nvme7n1p4
hello here is my nvme3n1 actual consumed size
/dev/nvme3n1 745G 475G 271G 64% /mnt/nvme3n1
/dev/nvme7n1p4 3.5T 25G 3.5T 1% /mnt/nvme7n1p4
[root@phobos mnt]# cd nvme3n1
[root@phobos nvme3n1]# ls
test1
[root@phobos nvme3n1]# du test1
491708712 test1
[root@phobos nvme3n1]# du test1 -h
469G test1
[root@phobos nvme3n1]#
I am pretty sure, the same db_bench command line, I can ingest at least 2T data if I do not hack db_paths. I feel the estimate part is not correct too based two db_paths I give.
du test1 -h
469G test1
This is still below the configured target (512GB) though. So RocksDB has respected the config, at least up until this point.
What does it look like when you run out of space? It looks like you have 271GB space available, so this doesn't look like the problematic scenario described earlier.
@ajkr I test again, now I increase k-v numbers,
./db_bench --num=120000000 --db=/mnt/nvme7n1p4/test1 --histogram=1 --key_size=4096 --value_size=8192 --compression_type=none --benchmarks="fillrandom,stats" --statistics --stats_per_interval=1 --stats_interval_seconds=60 --threads=1 --target_file_size_multiplier=10 --write_buffer_size=134217728 --use_existing_db=0 --disable_wal=false --cache_size=536870912 --bloom_bits=10 --bloom_locality=1 --compaction_style=0 --universal_max_size_amplification_percent=500 --max_write_buffer_number=16 --max_background_flushes=16 --level0_file_num_compaction_trigger=32 --level0_slowdown_writes_trigger=160 --level0_stop_writes_trigger=288 --soft_pending_compaction_bytes_limit=549755813888 --hard_pending_compaction_bytes_limit=1099511627776 --max_background_jobs=4 --max_background_compactions=4 --subcompactions=20
it is above 512G I specified in the code.
[root@phobos nvme3n1]# du -h test1/
688G test1/
[root@phobos nvme3n1]# ls
test1
[root@phobos nvme3n1]# cd /mnt/nvme7n1p4
[root@phobos nvme7n1p4]# ls
test1
[root@phobos nvme7n1p4]# cd test1
[root@phobos test1]# ls
[root@phobos test1]# du -h test1/
du: cannot access 'test1/': No such file or directory
[root@phobos test1]# cd ..
[root@phobos nvme7n1p4]# du -h test1/
0 test1/
[root@phobos nvme7n1p4]#
I still see it report no space on nvme3n1, and nvme7n1 is empty, why it cannot automatically start use 2nd nvme drive I specified in random workload? seq workload works fine.
** Compaction Stats [default] **
Level Files Size Score Read(GB) Rn(GB) Rnp1(GB) Write(GB) Wnew(GB) Moved(GB) W-Amp Rd(MB/s) Wr(MB/s) Comp(sec) CompMergeCPU(sec) Comp(cnt) Avg(sec) KeyIn KeyDrop Rblob(GB) Wblob(GB)
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
L0 3/0 383.20 MB 1.5 104.0 0.0 104.0 667.7 563.6 0.0 1.2 78.1 501.0 1364.65 610.06 4659 0.293 9055K 2811 0.0 0.0
L1 37/0 2.57 GB 10.3 1018.7 563.3 455.5 1009.6 554.2 0.0 1.8 3084.4 3056.8 338.22 1442.45 52 6.504 88M 795K 0.0 0.0
L2 141/0 26.73 GB 10.7 1351.5 498.4 853.1 1333.4 480.3 53.2 2.7 490.5 484.0 2821.24 1686.42 5419 0.521 117M 1578K 0.0 0.0
L3 161/4 279.64 GB 10.5 1370.4 500.6 869.8 1310.0 440.2 6.1 2.6 372.0 355.6 3772.55 1758.80 351 10.748 119M 5259K 0.0 0.0
L4 78/0 166.71 GB 0.7 143.3 143.3 0.0 143.3 143.3 23.4 1.0 421.0 421.0 348.63 189.05 27 12.912 12M 0 0.0 0.0
Sum 420/4 476.02 GB 0.0 3988.0 1705.7 2282.4 4464.1 2181.7 82.7 7.9 472.4 528.8 8645.30 5686.78 10508 0.823 347M 7636K 0.0 0.0
Int 0/0 0.00 KB 0.0 425.3 221.6 203.7 431.9 228.2 25.7 28.9 457.9 465.0 951.14 572.52 473 2.011 37M 725K 0.0 0.0
** Compaction Stats [default] **
Priority Files Size Score Read(GB) Rn(GB) Rnp1(GB) Write(GB) Wnew(GB) Moved(GB) W-Amp Rd(MB/s) Wr(MB/s) Comp(sec) CompMergeCPU(sec) Comp(cnt) Avg(sec) KeyIn KeyDrop Rblob(GB) Wblob(GB)
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Low 0/0 0.00 KB 0.0 3988.0 1705.7 2282.4 3900.4 1618.0 0.0 0.0 548.6 536.5 7444.23 5193.13 5989 1.243 347M 7636K 0.0 0.0
High 0/0 0.00 KB 0.0 0.0 0.0 0.0 563.7 563.7 0.0 0.0 0.0 480.6 1201.06 493.65 4519 0.266 0 0 0.0 0.0
Blob file count: 0, total size: 0.0 GB, garbage size: 0.0 GB, space amp: 0.0
Uptime(secs): 2036.4 total, 233.3 interval
Flush(GB): cumulative 563.667, interval 14.968
AddFile(GB): cumulative 0.000, interval 0.000
AddFile(Total Files): cumulative 0, interval 0
AddFile(L0 Files): cumulative 0, interval 0
AddFile(Keys): cumulative 0, interval 0
Cumulative compaction: 4464.07 GB write, 2244.80 MB/s write, 3988.05 GB read, 2005.43 MB/s read, 8645.3 seconds
Interval compaction: 431.92 GB write, 1895.52 MB/s write, 425.28 GB read, 1866.38 MB/s read, 951.1 seconds
Write Stall (count): cf-l0-file-count-limit-delays-with-ongoing-compaction: 277, cf-l0-file-count-limit-stops-with-ongoing-compaction: 0, l0-file-count-limit-delays: 1298, l0-file-count-limit-stops: 0, memtable-limit-delays: 10, memtable-limit-stops: 0, pending-compaction-bytes-delays: 3353, pending-compaction-bytes-stops: 0, total-delays: 4661, total-stops: 0, interval: 459 total count
Block cache LRUCache@0x29b3a40#925622 capacity: 512.00 MB usage: 85.48 KB table_size: 8192 occupancy: 8 collections: 4 last_copies: 0 last_secs: 0.0003 secs_since: 233
( 870, 1300 ] 3 0.000% 100.437%
( 1300, 1900 ] 3 0.000% 100.437%
( 1900, 2900 ] 1 0.000% 100.437%
** Level 1 read latency histogram (micros):
Count: 87444911 Average: 2.7148 StdDev: 20.32
Min: 0 Median: 1.8997 Max: 30068
Percentiles: P50: 1.90 P75: 2.47 P99: 3.82 P99.9: 5.75 P99.99: 6.64
------------------------------------------------------
[ 0, 1 ] 918170 1.050% 1.050%
( 1, 2 ] 47575913 54.407% 55.457% ###########
( 2, 3 ] 36373723 41.596% 97.053% ########
( 3, 4 ] 2079374 2.378% 99.431%
( 4, 6 ] 469800 0.537% 99.968%
( 6, 10 ] 120701 0.138% 100.106%
( 10, 15 ] 7193 0.008% 100.114%
( 15, 22 ] 7498 0.009% 100.123%
( 22, 34 ] 11944 0.014% 100.137%
( 34, 51 ] 5981 0.007% 100.143%
( 51, 76 ] 5388 0.006% 100.150%
( 76, 110 ] 2889 0.003% 100.153%
( 110, 170 ] 3336 0.004% 100.157%
( 170, 250 ] 2878 0.003% 100.160%
( 250, 380 ] 3887 0.004% 100.164%
( 380, 580 ] 5375 0.006% 100.171%
( 580, 870 ] 6682 0.008% 100.178%
( 870, 1300 ] 4750 0.005% 100.184%
( 1300, 1900 ] 1552 0.002% 100.185%
( 1900, 2900 ] 446 0.001% 100.186%
( 2900, 4400 ] 130 0.000% 100.186%
( 4400, 6600 ] 50 0.000% 100.186%
( 6600, 9900 ] 16 0.000% 100.186%
( 9900, 14000 ] 20 0.000% 100.186%
( 14000, 22000 ] 32 0.000% 100.186%
( 22000, 33000 ] 5 0.000% 100.186%
** Level 2 read latency histogram (micros):
Count: 113597193 Average: 8.4339 StdDev: 111.07
Min: 0 Median: 1.8722 Max: 415425
Percentiles: P50: 1.87 P75: 2.52 P99: 111.74 P99.9: 735.87 P99.99: 872.61
------------------------------------------------------
[ 0, 1 ] 7279171 6.408% 6.408% #
( 1, 2 ] 56775984 49.980% 56.388% ##########
( 2, 3 ] 40709097 35.836% 92.224% #######
( 3, 4 ] 3596922 3.166% 95.391% #
( 4, 6 ] 1089665 0.959% 96.350%
( 6, 10 ] 256388 0.226% 96.576%
( 10, 15 ] 31991 0.028% 96.604%
( 15, 22 ] 612694 0.539% 97.143%
( 22, 34 ] 1019097 0.897% 98.040%
( 34, 51 ] 341445 0.301% 98.341%
( 51, 76 ] 335042 0.295% 98.636%
( 76, 110 ] 403352 0.355% 98.991%
( 110, 170 ] 358376 0.315% 99.306%
( 170, 250 ] 214442 0.189% 99.495%
( 250, 380 ] 156578 0.138% 99.633%
( 380, 580 ] 185371 0.163% 99.796%
( 580, 870 ] 219503
( 4, 6 ] 918044 1.037% 91.307%
( 6, 10 ] 147145 0.166% 91.473%
( 10, 15 ] 18311 0.021% 91.494%
( 15, 22 ] 1019198 1.151% 92.645%
( 22, 34 ] 1850674 2.090% 94.736%
( 34, 51 ] 551989 0.624% 95.359%
( 51, 76 ] 592078 0.669% 96.028%
( 76, 110 ] 913079 1.031% 97.059%
( 110, 170 ] 852107 0.963% 98.022%
( 170, 250 ] 438942 0.496% 98.518%
( 250, 380 ] 291378 0.329% 98.847%
( 380, 580 ] 363353 0.410% 99.257%
( 580, 870 ] 429572 0.485% 99.742%
( 870, 1300 ] 236889 0.268% 100.010%
( 1300, 1900 ] 55465 0.063% 100.073%
( 1900, 2900 ] 9113 0.010% 100.083%
( 2900, 4400 ] 1826 0.002% 100.085%
( 4400, 6600 ] 866 0.001% 100.086%
( 6600, 9900 ] 708 0.001% 100.087%
( 9900, 14000 ] 823 0.001% 100.088%
( 14000, 22000 ] 1923 0.002% 100.090%
( 22000, 33000 ] 124 0.000% 100.090%
( 33000, 50000 ] 6 0.000% 100.090%
( 50000, 75000 ] 38 0.000% 100.090%
( 75000, 110000 ] 27 0.000% 100.090%
( 110000, 170000 ] 22 0.000% 100.090%
( 170000, 250000 ] 1 0.000% 100.090%
** Level 4 read latency histogram (micros):
Count: 135 Average: 488.1185 StdDev: 1009.79
Min: 0 Median: 4.3571 Max: 6165
Percentiles: P50: 4.36 P75: 158.16 P99: 3875.00 P99.9: 6165.00 P99.99: 6165.00
------------------------------------------------------
[ 0, 1 ] 47 34.815% 34.815% #######
( 1, 2 ] 7 5.185% 40.000% #
( 2, 3 ] 1 0.741% 40.741%
( 3, 4 ] 10 7.407% 48.148% #
( 4, 6 ] 14 10.370% 58.519% ##
( 6, 10 ] 1 0.741% 59.259%
( 10, 15 ] 1 0.741% 60.000%
( 34, 51 ] 1 0.741% 60.741%
( 51, 76 ] 4 2.963% 63.704% #
( 110, 170 ] 19 14.074% 77.778% ###
( 50000, 75000 ] 57 0.000% 100.065%
( 75000, 110000 ] 54 0.000% 100.065%
( 110000, 170000 ] 34 0.000% 100.065%
( 170000, 250000 ] 5 0.000% 100.065%
( 250000, 380000 ] 1 0.000% 100.065%
** Level 4 read latency histogram (micros):
Count: 66253371 Average: 19.2299 StdDev: 205.26
Min: 0 Median: 1.4686 Max: 287704
Percentiles: P50: 1.47 P75: 1.87 P99: 530.47 P99.9: 1221.65 P99.99: 1718.20
------------------------------------------------------
[ 0, 1 ] 13621811 20.560% 20.560% ####
( 1, 2 ] 41625941 62.828% 83.389% #############
( 2, 3 ] 3053053 4.608% 87.997% #
( 3, 4 ] 818512 1.235% 89.232%
( 4, 6 ] 796751 1.203% 90.435%
( 6, 10 ] 100794 0.152% 90.587%
( 10, 15 ] 12957 0.020% 90.606%
( 15, 22 ] 915991 1.383% 91.989%
( 22, 34 ] 1423245 2.148% 94.137%
( 34, 51 ] 440064 0.664% 94.801%
( 51, 76 ] 547668 0.827% 95.628%
( 76, 110 ] 740871 1.118% 96.746%
( 110, 170 ] 591639 0.893% 97.639%
( 170, 250 ] 370863 0.560% 98.199%
( 250, 380 ] 290045 0.438% 98.637%
( 380, 580 ] 319832 0.483% 99.120%
( 580, 870 ] 363842 0.549% 99.669%
( 870, 1300 ] 187383 0.283% 99.952%
( 1300, 1900 ] 36562 0.055% 100.007%
( 1900, 2900 ] 4626 0.007% 100.014%
( 2900, 4400 ] 1041 0.002% 100.015%
( 4400, 6600 ] 441 0.001% 100.016%
( 6600, 9900 ] 391 0.001% 100.017%
( 9900, 14000 ] 695 0.001% 100.018%
( 14000, 22000 ] 1733 0.003% 100.020%
( 22000, 33000 ] 252 0.000% 100.021%
( 33000, 50000 ] 54 0.000% 100.021%
( 50000, 75000 ] 43 0.000% 100.021%
( 75000, 110000 ] 16 0.000% 100.021%
( 110000, 170000 ] 17 0.000% 100.021%
( 170000, 250000 ] 7 0.000% 100.021%
( 250000, 380000 ] 5 0.000% 100.021%
** DB Stats **
Uptime(secs): 4466.6 total, 60.0 interval
Cumulative writes: 76M writes, 76M keys, 76M commit groups, 1.0 writes per commit group, ingest: 877.86 GB, 201.25 MB/s
Cumulative WAL: 76M writes, 0 syncs, 76603000.00 writes per sync, written: 877.86 GB, 201.25 MB/s
Cumulative stall: 01:00:55.317 H:M:S, 81.8 percent
Interval writes: 629K writes, 629K keys, 629K commit groups, 1.0 writes per commit group, ingest: 7381.29 MB, 123.01 MB/s
Interval WAL: 629K writes, 0 syncs, 629000.00 writes per sync, written: 7.21 GB, 123.01 MB/s
Interval stall: 00:00:52.258 H:M:S, 87.1 percent
Write Stall (count): write-buffer-manager-limit-stops: 0, num-running-compactions: 4
num-running-flushes: 2
put error: IO error: No space left on device: While appending to file: /mnt/nvme3n1/test1/062914.log: No space left on device
Sorry I have not had time to look more. Have you checked whether the feature is completely broken for you? For example, if you configure options.db_paths[0] to be very small, say 1GB, will fillrandom make use of both drives?
Sorry I have not had time to look more. Have you checked whether the feature is completely broken for you? For example, if you configure options.db_paths[0] to be very small, say 1GB, will fillrandom make use of both drives?
thank you so much, I will try this and update you. :)
@ajkr I tried make db_paths[0] as 1GB, it is interesting that. the SLC nvme device and QLC nvme device both have BW 1100MB/s, the sst log files are all written into SLC, and other SST files are saved in QLC NVMe SSD. this follows my expection. but why db_path[0] with bigger size not working? SLC files snapshot
[root@phobos test1]# ls -l
total 2236468
-rw-r--r-- 1 root root 133949409 May 26 01:43 028243.sst
-rw-r--r-- 1 root root 133949409 May 26 01:43 028246.sst
-rw-r--r-- 1 root root 133949408 May 26 01:43 028249.sst
-rw-r--r-- 1 root root 133949408 May 26 01:43 028252.sst
-rw-r--r-- 1 root root 133949408 May 26 01:43 028255.sst
-rw-r--r-- 1 root root 133949409 May 26 01:43 028258.sst
-rw-r--r-- 1 root root 133949409 May 26 01:43 028261.sst
-rw-r--r-- 1 root root 133724568 May 26 01:43 028263.log
-rw-r--r-- 1 root root 133949408 May 26 01:43 028264.sst
-rw-r--r-- 1 root root 133724568 May 26 01:43 028266.log
-rw-r--r-- 1 root root 133949409 May 26 01:43 028267.sst
-rw-r--r-- 1 root root 34579481 May 26 01:43 028269.log
-rw-r--r-- 1 root root 58614640 May 26 01:43 028270.sst
-rw-r--r-- 1 root root 16 May 26 01:25 CURRENT
-rw-r--r-- 1 root root 36 May 26 01:25 IDENTITY
-rw-r--r-- 1 root root 0 May 26 01:25 LOCK
-rw-r--r-- 1 root root 97012782 May 26 01:43 LOG
-rw-r--r-- 1 root root 36449 May 26 01:25 LOG.old.1685035525334064
-rw-r--r-- 1 root root 438525404 May 26 01:43 MANIFEST-000009
-rw-r--r-- 1 root root 7090 May 26 01:25 OPTIONS-000007
-rw-r--r-- 1 root root 7111 May 26 01:25 OPTIONS-000011
[root@phobos test1]#
iostat nvme3n1 is SLC, nvme7n1 is QLC
Device r/s rMB/s rrqm/s %rrqm r_await rareq-sz w/s wMB/s wrqm/s %wrqm w_await wareq-sz d/s dMB/s drqm/s %drqm d_await dareq-sz f/s f_await aqu-sz %util
nvme7n1 0.00 0.00 0.00 0.00 0.00 0.00 8851.20 1022.08 0.00 0.00 10.28 118.25 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 91.00 30.94
nvme7n1p4 0.00 0.00 0.00 0.00 0.00 0.00 8851.20 1022.08 0.00 0.00 10.28 118.25 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 91.00 30.94
nvme3n1 0.00 0.00 0.00 0.00 0.00 0.00 8905.00 1022.68 0.00 0.00 9.72 117.60 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 86.52 30.30
I see. Then we probably just do not adhere strictly enough to the configured limits. We should take a look and see if we can improve it for users who set limits close to their available space. I don't think it's something we'll get to in the near-term, but if you are interested, please feel free to see if there's any way to improve the db_paths/cf_paths heuristics.
Expected behavior
if I configure two NVMe SSD devices with db_paths, RocksDB should start use 2nd NVMe SSD after 1st one is full.
Actual behavior
now, RocksDB report no space left error when 1st NVMe SSD is full.
Steps to reproduce the behavior
prepare two NVMe SSD, create one small partition for 1st NVMe SSD, then configure db_paths as below
[root@phobos rocksdb]# ./db_bench --num=60000000 --db=/mnt/nvme7n1p4/test1 --histogram=1 --key_size=4096 --value_size=8192 --compression_type=none --benchmarks="fillrandom,stats" --statistics --stats_per_interval=1 --stats_interval_seconds=240 --threads=1 --target_file_size_multiplier=10 --write_buffer_size=134217728 --use_existing_db=0 --disable_wal=false --cache_size=536870912 --bloom_bits=10 --bloom_locality=1 --compaction_style=0 --universal_max_size_amplification_percent=500 --max_write_buffer_number=16 --max_background_flushes=16 --level0_file_num_compaction_trigger=32 --level0_slowdown_writes_trigger=160 --level0_stop_writes_trigger=288 --soft_pending_compaction_bytes_limit=549755813888 --hard_pending_compaction_bytes_limit=1099511627776 --max_background_jobs=4 --max_background_compactions=4 --subcompactions=20