Tessil / hat-trie

C++ implementation of a fast and memory efficient HAT-trie
MIT License
792 stars 114 forks source link

Feature suggestion: obtaining a path to the longest prefix #32

Open gh-andre opened 4 years ago

gh-andre commented 4 years ago

It would be quite useful if it was possible not just get an iterator to the longest prefix, but also a path leading to that prefix from the shortest prefix. Such a method would return a vector of iterators pointing to values that matched the first characters of the longest prefix argument. In other words, if a map contained these values:

/A
/A/B/C
/A/B/D
/A/B/E

, and the longest prefix would be requested as /A/B/D/E/F, then in a loop similar to the ones in equal_prefix_range_impl and filter_prefix it would test if keys of nodes that have values would match the beginning of the prefix string, so in the example above, it would return a vector of iterators pointing to /A and /A/B/D.

It would allow many useful operations for parent paths, which now can only be done via multiple calls with prefixes of different lengths.

A special "parent iterator" hiding the parent path vector and overloading -- would probably fit nicely in the interface.

Sorry if there is a way to achieve what I described and I missed it.

Tessil commented 4 years ago

It should be possible to adapt the longest_prefix method into a new one that return all the encountered prefixes instead of just the longest one.

It may take some time before I start working on it as I have been lacking the time recently. Any PR is welcome though.

gh-andre commented 4 years ago

@Tessil Thanks for responding. I am test-driving some other trie libraries and, depending on how testing goes, may revisit hat-trie later on. I quite like how the interface is structured and how it is performance-focused.

ecorm commented 1 year ago

I am test-driving some other trie libraries

@gh-andre I also need the feature you propose. If you don't mind me asking, which one did you end up using which supports this feature?

gh-andre commented 1 year ago

@ecorm I ended up using Akamai Radix Tree. It allows searches, tree traversal and longest prefix searches. I did need to write a few wrappers around some of the components, but in general it worked out very well for me.

https://github.com/neufeldm/akamai-radix-tree

ecorm commented 1 year ago

@gh-andre , @Tessil , submitted PR #52 to add this functionality.

ecorm commented 1 year ago

@gh-andre BTW, I checked out Akamai Radix Tree and didn't like that I had to provide things like the desired max depth and the maximum edge length. Choosing a very large max depth is not feasible, because there's a reserve() in there that would allocate a huge buffer needlessly for small sets of inputs.

Tessil commented 1 year ago

@ecorm Thank you very much for the PR! I haven't had much time to implement this. I'll try to review the PR this weekend.

gh-andre commented 1 year ago

@ecorm Akamai Radix Tree may be a bit tricky indeed to work with. The maximum depth isn't a problem for many areas, such as maintaining IP addresses or region codes. It would present some challenge, as you describe, for arbitrary dictionaries.

It's been too long for me being away from this code to provide meaningful feedback for the changes, but it's good to know that this new functionality will be available in this library. Thank you for the heads-up and your work.