compaction gets stuck forever on L0->base during write burst

ajkr commented 5 years ago

Note: Please use Issues only for bug reports. For questions, discussions, feature requests, etc. post to dev group: https://www.facebook.com/groups/rocksdb.dev

RocksDB gets stuck in L0->L0 and L0->base compactions in very write-heavy benchmarks. base->base+1 almost never happens and base level's score is usually reported as zero due to pending L0->base.

It is caused by interactions between:

(1) intra-L0 compaction, (2) Siying's optimization to use L0 size as base level target size in write-heavy scenario, and (3) An existing workaround to disable base->base+1 compaction when L0 is eligible for compaction but not scheduled.

(3) has existed the longest but does not seem particularly relevant in these modern times, where we can keep doing intra-L0 while base level is contended. I saw more benefit than expected by removing it, though cannot explain why yet.

Benchmark command:

$ TEST_TMPDIR=/data/compaction_bench ./db_bench -benchmarks=filluniquerandom -num=50000000 -max_write_buffer_number=4 -rate_limiter_bytes_per_sec=41943040 -write_buffer_size=2097152 -target_file_size_base=262072 -max_bytes_for_level_base=4194304 -compression_type=none -max_background_jobs=3 -level_compaction_dynamic_level_bytes=true -target_file_size_multiplier=2 -level0_file_num_compaction_trigger=2 -stats_per_interval=1 -stats_interval_seconds=10

Results with (3):

filluniquerandom :      24.812 micros/op 40303 ops/sec;    4.5 MB/s

Level    Files   Size     Score Read(GB)  Rn(GB) Rnp1(GB) Write(GB) Wnew(GB) Moved(GB) W-Amp Rd(MB/s) Wr(MB/s) Comp(sec) CompMergeCPU(sec) Comp(cnt) Avg(sec) KeyIn KeyDrop
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  L0     23/8   347.12 MB  53.6     12.9     0.0     12.9      18.6      5.7       0.0   3.3     16.8     24.1    789.66             82.13      3775    0.209    112M      0
  L4   5835/5835  1.44 GB   0.0     16.2     3.8     12.4      16.2      3.8       0.0   4.3     21.9     21.9    755.57            225.09        11   68.689    141M      0
  L5  11715/0    2.91 GB   2.9      5.4     1.8      3.7       5.4      1.8       1.9   3.1     23.6     23.6    235.62             61.19       542    0.435     47M      0
  L6   2336/0   986.33 MB   0.0      1.2     1.0      0.2       1.2      1.0       0.0   1.2     14.7     14.7     81.88              9.90       534    0.153     10M      0
 Sum  19909/5843  5.65 GB   0.0     35.7     6.5     29.2      41.3     12.1       1.9   7.3     19.6     22.7   1862.73            378.31      4862    0.383    312M      0
 Int      0/0    0.00 KB   0.0      0.3     0.0      0.3       0.4      0.1       0.0   4.1     15.5     20.4     18.96              1.89        62    0.306   2500K      0

Results without (3):

filluniquerandom :      22.113 micros/op 45222 ops/sec;    5.0 MB/s

Level    Files   Size     Score Read(GB)  Rn(GB) Rnp1(GB) Write(GB) Wnew(GB) Moved(GB) W-Amp Rd(MB/s) Wr(MB/s) Comp(sec) CompMergeCPU(sec) Comp(cnt) Avg(sec) KeyIn KeyDrop
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  L0     22/13  930.28 MB 184.6     13.9     0.0     13.9      19.5      5.6       0.0   3.4     17.3     24.4    819.79             85.65      3825    0.214    121M      0
  L4  15106/15106  3.75 GB   0.0     17.7     3.8     13.9      17.7      3.8       0.0   4.6     22.4     22.4    806.52            238.72        13   62.040    154M      0
  L5   1865/0   472.58 MB   0.3      2.2     0.8      1.4       2.2      0.8       0.1   2.9     21.9     21.9    102.76             23.15         6   17.126     19M      0
  L6   1533/0   535.67 MB   0.0      0.7     0.5      0.2       0.7      0.5       0.0   1.4     15.4     15.4     48.60              6.37       423    0.115   6395K      0
 Sum  18526/15119  5.64 GB   0.0     34.5     5.1     29.4      40.1     10.7       0.1   7.1     19.9     23.1   1777.67            353.89      4267    0.417    301M      0
 Int      0/0    0.00 KB   0.0      0.2     0.0      0.2       0.3      0.1       0.0   3.1     14.6     21.4     14.65              1.46        57    0.257   1821K      0

I'd also speculate that when (2) is active, we should be calculate L0 compaction score using file count only, i.e., do not take into account L0 size.

siying commented 5 years ago

I didn't know we have (3). When did we introduce it?

ajkr commented 5 years ago

Looks like 235b162be13910a5f5b72cf0b30bd3255de14d67.

The more concerning part to me is using my L0 compaction scoring based on file size together with your base-level compaction scoring based on comparison to L0 size. I think that makes the base level's score unfairly low.

siying commented 5 years ago

Oh I did it. Oops.

siying commented 5 years ago

The LSM-tree shape looks much better with (3) than without. Is it a side effect?

ajkr commented 5 years ago

Yeah, you're right, the experiments in the description do not prove (3) should be removed. I should've also measured without (3) together with L0 scoring change. Both the strategies shown in the description end up top-heavy and not really satisfying the nice smooth shape we want.

siying commented 5 years ago

@ajkr I agree. We should rethink it more.

facebook / rocksdb

compaction gets stuck forever on L0->base during write burst #4991