blamario / monoid-subclasses

Subclasses of Monoid with a solid theoretical foundation and practical purposes
BSD 3-Clause "New" or "Revised" License
33 stars 9 forks source link

`Lexicographic` class would be useful for radix trees #30

Open tysonzero opened 3 years ago

tysonzero commented 3 years ago

Something along the lines of:

class (Ord m, LeftGCDMonoid m, MonoidNull m) => Lexicographic m where

To rule out cases where the ordering changes over the course of the comparison, we should have the following law:

prop :: Lexicographic m => m -> m -> m -> Bool
prop x y z = compare x y == compare (z <> x) (z <> y)

It seems like we may also need a law that any value in between two other values should share a equal or longer prefix with both of them than the two do with each other, to rule out cases like bar < foo < baz, as it's unclear whether or not that follows from the above law:

prop :: Lexicographic m => m -> m -> m -> Bool
prop x y z = (x <= y && y <= z)
          <= ( commonPrefix x z `isPrefixOf` commonPrefix x y
            && commonPrefix x z `isPrefixOf` commonPrefix y z
             )

However it's worth analyzing these laws further to make sure they are correct.

Given the above it would be possible to build an efficient radix-tree library built on top of Map that works for any arbitrary Lexicographic key.

Using the three classes separately has a variety of downsides:

You can't use Map.lookupLE/GT to search for shared prefixes, you have to exhaustively search the entire Map when inserting/editing.

You can't expose efficient minimum/lookupLT/etc. functions that run in logarithmic/key-length time and instead they all have to traverse the entire structure.

blamario commented 3 years ago

Can you think of any application of the Lexicographic class other than a radix tree? If not, the best place for the class would probably be within the radix tree library.

I think the name Lexicographic is underselling the generality of the class you're proposing. One of its instances, for example, could be Product Natural which I wouldn't call lexicographic. Every LeftReductive instance gives rise to a partial order:

(a <= b) <= (a `isPrefixOf` b)

What you're after is a guarantee that the total order defined by the type's Ord instance is an extension of this partial order, and I'd probably name this property something like PrefixOrd. There's no need to bring MonoidNull and LeftGCDMonoid into it, unless you need their methods for a more efficient radix tree. If you do, that would be another sign that the class belongs with the implementation of the tree.

tysonzero commented 3 years ago

Hmm. Radix trees are the main use-case that come to mind.

Product Natural doesn't obey the second law. For example 4, 5, 6. This makes sense as it'd not work well to build a radix tree off of it.

Sum Natural does follow the laws, but I don't think it's crazy to call it a lexicographic ordering, as you could think of it as a repeated tally of the same character.

Did you mean (a `isPrefixOf` b) <= (a <= b)? The other way rules out the Text instance via bar <= foo. I'm not sure what that gets you though, in terms of new functions or data structures you can write.

If we stick to the laws I gave, you need LeftGCDMonoid and Ord to express them, so you can't really drop those out.

MonoidNull isn't required for the efficient radix tree (just use == mempty), nor is it required for the laws. However every Lexicographic instance is by definition a MonoidNull instance, so you may as well have it to minimize constraint proliferation (e.g. FTP proposal).

Implementing it with the radix tree library isn't a crazy idea for sure. My motivation for bringing it into this library is to make it more likely that third party library types will implement it, as it's probably easier to get them to depend on this library than a random radix tree library. Particularly since it doesn't bring in any additional dependencies or even functions lol.

blamario commented 3 years ago

You make some good points. Yes, I mistakenly swapped the (a `isPrefixOf` b) <= (a <= b) law. It seems to me that lookup in the radix tree could still rely only on that law (and Ord and LeftReductive instances), but insertion would require more.