`IndexCompact`: store an additional 16 bits in the primary vector

jorisdral commented 1 month ago

This PR addresses the worst-case behaviour that we have seen in the new micro-benchmark from #295. Also, this PR resolves #204.

With the fix, average search time and allocation for construction and search go up a bit, but the worst-case scenario that appsWithNearDups exercises drastically improves. search is back to allocating zero bytes, and allocations for construction are more than halved. Moreover, it is not clearly visible from the benchmark results, but the max RSS of the program goes down too.

Benchmark                                                                                                     before    after-after
Bench.Database.LSMTree.Internal.IndexCompact/construction/construction with 0-bit  rfprec and chunk size 100  0.408e-4  0.427e-4  +4.69%
Bench.Database.LSMTree.Internal.IndexCompact/construction/construction with 16-bit rfprec and chunk size 100  1.080e-4  1.145e-4  +6.02%
Bench.Database.LSMTree.Internal.IndexCompact/construction/unsafeWriteRange-10k                                0.580e-6  0.581e-6  +0.07%
Bench.Database.LSMTree.Internal.IndexCompact/construction/unsafeWriteRange-1k                                 0.937e-7  0.945e-7  +0.81%
Bench.Database.LSMTree.Internal.IndexCompact/non-uniformity/construct appsWithNearDups                        2.098e-4  0.441e-4 -78.99%
Bench.Database.LSMTree.Internal.IndexCompact/non-uniformity/construct appsWithoutNearDups                     2.002e-5  2.080e-5  +3.86%
Bench.Database.LSMTree.Internal.IndexCompact/non-uniformity/search appsWithNearDups                           0.939e-3  0.042e-3 -95.55%
Bench.Database.LSMTree.Internal.IndexCompact/non-uniformity/search appsWithoutNearDups                        2.492e-5  3.759e-5 +50.87%
Bench.Database.LSMTree.Internal.IndexCompact/searches/searches with 0-bit  rfprec                             0.386e-4  0.558e-4 +44.58%
Bench.Database.LSMTree.Internal.IndexCompact/searches/searches with 16-bit rfprec                             0.980e-5  1.660e-5 +69.38%
Geometric mean                                                                                                1.972e-5  1.430e-5 -27.50%

RE allocations, compare with https://github.com/IntersectMBO/lsm-tree/pull/295#issue-2409218713

benchmarking Bench.Database.LSMTree.Internal.IndexCompact/non-uniformity/construct appsWithNearDups
time                 44.18 μs   (44.00 μs .. 44.41 μs)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 44.08 μs   (44.00 μs .. 44.21 μs)
std dev              342.8 ns   (251.9 ns .. 455.8 ns)
allocated:           1.000 R²   (1.000 R² .. 1.000 R²)
  iters              349723.776 (349721.941 .. 349725.564)
  y                  3602.303   (2724.377 .. 4358.583)

benchmarking Bench.Database.LSMTree.Internal.IndexCompact/non-uniformity/construct appsWithoutNearDups
time                 20.84 μs   (20.74 μs .. 20.96 μs)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 20.80 μs   (20.73 μs .. 20.88 μs)
std dev              241.7 ns   (174.6 ns .. 326.1 ns)
allocated:           1.000 R²   (1.000 R² .. 1.000 R²)
  iters              167387.990 (167387.899 .. 167388.090)
  y                  3483.016   (3224.158 .. 3735.575)

benchmarking Bench.Database.LSMTree.Internal.IndexCompact/non-uniformity/search appsWithNearDups
time                 41.98 μs   (41.58 μs .. 42.33 μs)
                     0.999 R²   (0.999 R² .. 1.000 R²)
mean                 41.76 μs   (41.49 μs .. 42.04 μs)
std dev              916.8 ns   (707.5 ns .. 1.225 μs)
allocated:           0.999 R²   (0.998 R² .. 0.999 R²)
  iters              31.975     (31.807 .. 32.200)
  y                  3062.478   (2743.924 .. 3368.551)
variance introduced by outliers: 19% (moderately inflated)

benchmarking Bench.Database.LSMTree.Internal.IndexCompact/non-uniformity/search appsWithoutNearDups
time                 37.88 μs   (37.50 μs .. 38.36 μs)
                     0.998 R²   (0.996 R² .. 1.000 R²)
mean                 37.59 μs   (37.18 μs .. 38.29 μs)
std dev              1.664 μs   (1.074 μs .. 2.842 μs)
allocated:           0.999 R²   (0.998 R² .. 0.999 R²)
  iters              32.025     (31.854 .. 32.224)
  y                  3035.105   (2714.894 .. 3341.617)
variance introduced by outliers: 50% (moderately inflated)

jorisdral commented 1 month ago

I have yet to fix the tests, and to check how the macro-benchmarks are affected. Will update on that later

jorisdral commented 1 month ago

Closed, since we're refactoring the compact index.

IntersectMBO / lsm-tree

`IndexCompact`: store an additional 16 bits in the primary vector #298