db: unbounded L0 garbage accumulation

cockroachdb / pebble

RocksDB/LevelDB inspired key-value database in Go

BSD 3-Clause "New" or "Revised" License

4.79k stars 444 forks source link

db: unbounded L0 garbage accumulation #2549

Open jbowens opened 1 year ago

jbowens commented 1 year ago

Special level scoring logic applies to L0. None of this logic accounts for potentially high compensated sizes (PointDeletionBytesEstimate and RangeDeletionBytesEstimate). This can result in tombstones deleting an unbounded volume of data lower in the LSM accumulating in L0, preventing the space reclamation.

TODO(jackson): Add a detailed outline of L0's existing scoring logic.

Jira issue: PEBBLE-49

Epic CRDB-40361

sumeerbhola commented 1 year ago

Could we do something simple where if the current score of L0 < 1.0, but the total bytes L0 will delete > 10% of the LSM size, we bump up the score to 1.0? For a small LSM with Lbase = L6, this should have the desired aggressive effect. For very large LSMs this will be extremely rare, but 10% space amp isn't bad.

jbowens commented 1 year ago

Yeah, I think that's reasonable.

Is there a downside to incorporating (compensated) level size into L0 scoring more generally, eg, maybe as a multiple of LBaseMaxBytes, and taking the MAX of the computed scores?

sumeerbhola commented 1 year ago

Is there a downside to incorporating (compensated) level size into L0 scoring more generally, eg, maybe as a multiple of LBaseMaxBytes, and taking the MAX of the computed scores?

Do you mean something like score = L0_compensated_size / (k * LbaseMaxBytes)? Should be ok too, and more principled. I was being paranoid that there may be some existing workloads with a sub-level count of 1 and > 64MB in L0 that manage to delay L0 => Lbase compactions (and hence reduce write amp), so was trying to scope the re-scoring logic narrowly to the garbage case.