IntersectMBO / lsm-tree

A Haskell library for on-disk tables based on LSM-trees
Apache License 2.0
24 stars 7 forks source link

Add binary search function for vectors #256

Closed jeltsch closed 2 months ago

jeltsch commented 2 months ago

This will ultimately resolve #252.

jeltsch commented 2 months ago

There has been some discussion already regarding whether to use a custom implementation of such a search function or rather rely on vector-algorithms and unsafeThaw. Here are the relevant bits:

@jeltsch:

The reason for me not using vector-algorithms is precisely that it only works with mutable vectors. I guess that it would indeed be possible to use it by employing unsafeThaw. However, I find it questionable to use an unsafe operation where there is a safe alternative, just because of limitations of current libraries.

@jorisdral:

In this case, it is a safe use of unsafeThaw: it is only thawed within a local scope where no alterations to the vector are made. Maybe it's a shortcoming of the library that they don't have a version on immutable vectors, but I think it's sensible to use here. IMO we care more about performance than strict safety in the lsm-tree library, and it's still safe to use unsafeThaw, because of reasoning stated above. Moreover, vector-algorithms is to some degree tested/benchmarked/battle-tested, so that's also an advantage

jeltsch commented 2 months ago

@jorisdral, I perfectly understand your reasoning. However, the documentation of unsafeThaw makes it very clear that unsafeThaw is a really dangerous function. It should be safe to use in the way that you describe if we can make sure that the binarySearchL function of vector-algorithms doesn’t mutate the vectors it operates on, but can we guarantee this? Yes, it seems weird that a binary search should make the given vectors different temporarily. However, by assuming it doesn’t, we rely on what is essentially an implementation detail of the vector-algorithms package. I’d really feel uncomfortable about this.

jeltsch commented 2 months ago

By the way, I guess that the reason for vector-algorithms providing search only for mutable vectors is that it is generally focused on creating sorted vectors, where modifying in place makes a lot of sense, and provides using sorted vectors, in the form of search, only as a secondary feature.

jeltsch commented 2 months ago

Given that @dcoutts mentioned outside of GitHub that he also finds it acceptable to use unsafeThaw in order to reuse vector-algorithms and given that we have such a use already in the implementation of the compact index, I’ve changed the implementation accordingly.