eBay / Jungle

An embedded key-value store library specialized for building state machine and log store
Apache License 2.0
224 stars 54 forks source link

Hi, Why is the space amplification of Jungle bigger than Tiering? #147

Closed zjs1224522500 closed 1 year ago

zjs1224522500 commented 1 year ago

image

greensky00 commented 1 year ago

Please refer to the 4th paragraph of Section 4 in the paper. The size of sorted runs (SSTables) is not even, so if there are 10 SSTable for the same key range, the aggregated size is not 10x of leveling; it is averaged out to ~3.3.

The same thing also happens in Jungle; batch size is not even. But, Jungle triggers inter-level merge based on the actual size, not the number of batches. So it is closer to the expected size (10x for C=10) than tiering.

Page 57 and 58 in this slide visualize it.

zjs1224522500 commented 1 year ago

Got it~ Thanks for your reply!