IntersectMBO / lsm-tree

A Haskell library for on-disk tables based on LSM-Trees
Apache License 2.0
28 stars 7 forks source link

Adding functionality to initialise RunReaders at offset #346

Closed recursion-ninja closed 1 month ago

recursion-ninja commented 2 months ago

Description

First part of Work Product 12, adding the functionality to start RunReaders at an offset.

The new rawPageFindKey function has the same performance characteristics as rawPageLookup but facilitates starting the RunReaders at an give page offset from the returned index. RunReader.new now takes an optional offset as an argument.

mheinzel commented 2 months ago

There is a build failure for GHC versions before 9.6 which seems related to UnboxedSums (it's available since 8.2.1, but maybe something changed?).

Also, there is a docspec-related error, seems to happen on all Ubuntu builds.

recursion-ninja commented 1 month ago

Regression Fix Report

I have isolated and corrected the allocation regression in rawPageLookup that the PR contained. The excess allocation came from the "linear scanning" portion of the bisectPageToKey function.

Previously, when the bench-marking showed identical allocations and comparable run-times, there was a defect in the bisectPageToKey function. To correct the defect, an additional check was required to determine of the binary search ended at the last entry in the page. If so, the binary search result needed to be changed from "found closest key at index i" to "key not found in page." This additional comparison work, which existed in the terminal "linear scan" portion of the binary search routine, was inexorably linked to the additional allocation.

I attempted to alter the code while retaining the linear scan portion of the binary search routine. After repeated an persistent failure to remove the additional allocation, I removed the linear scan segment of the binary search routine. To my pleasant surprise, the allocations decreased to the same or less than the allocations reported on main!

Below is a benchmark table I collated describing the change in bytes allocated (B) and run-time measured (ns) between main and this PR across recent versions of GHC. In the Δ column, a negative number represents an improvement when measuring the performance of this PR (relative to main).

Results

rawPageLookup GHC main (B, ns) PR-#346 (B, ns) Diif-Δ (B, ns)
missing 9.10.1 (544, 101.50) (544, 104.90) (± 0, + 3.40)
existing-head 9.10.1 (560, 98.72) (560, 101.70) (± 0, + 2.98)
existing-last 9.10.1 (640, 96.91) (576, 88.27) (-64, - 8.64)
missing 9.8.2 (544, 102.90) (544, 104.90) (± 0, + 2.00)
existing-head 9.8.2 (560, 100.40) (560, 103.00) (± 0, + 2.60)
existing-last 9.8.2 (640, 98.92) (576, 91.17) (-64, - 7.75)
missing 9.6.6 (544, 104.20) (544, 103.50) (± 0, - 0.70)
existing-head 9.6.6 (560, 99.82) (560, 90.30) (± 0, - 9.52)
existing-last 9.6.6 (640, 97.21) (576, 88.48) (-64, - 9.73)
missing 9.4.8 (544, 102.90) (544, 105.40) (± 0, + 2.50)
existing-head 9.4.8 (560, 100.00) (560, 103.10) (± 0, + 3.10)
existing-last 9.4.8 (640, 99.06) (576, 88.84) (-64, -10.22)

Conclusion

My reading of the tabulated measurements is that the PR, with the recent regression fix, now constitutes:

recursion-ninja commented 1 month ago

I believe that this resolves the last critical technical issue of the PR. I'm going to give the PR a final review to ensure that all comments have been addressed.