Tessil / hat-trie

C++ implementation of a fast and memory efficient HAT-trie
MIT License
795 stars 114 forks source link

Feature question / suggestion: Insertion via iterators #33

Open jamescooper-blis opened 4 years ago

jamescooper-blis commented 4 years ago

Hi, currently all insert() / emplace() functions takeconst CharT* key, size_type key_size parameters.

I have a use-case to allow inserting via iterators instead (I need to insert a std::string / std::string_view backwards) and have started to add these functions. However, it's quickly snowballing so before I continue I'm wondering if this is something you have tried yourself? If it's ultimately not possible I'd like to know! If it's worth pursuing, and contributing back to you, it will most likely end up with the engine of the library being written to use iterators and supplying std::string / std::string_view, {const Char*T, size_type} wrappers over these new functions. What do you think about this? Would you accept a PR for such a large refactor?

Tessil commented 4 years ago

Hi,

Thank you for the suggestion. It isn't something I considered before but it may be an interesting addition.

After looking a bit how feasible it is, I think the main problem will be with the Hash and KeyEqual template parameters used in array_hash. Ideally I'd like to avoid any breaking change but the API of these two function objects takes a contiguous buffer and a size. This makes it easy to call most hash libraries functions which usually have a similar API.

We could change it to a pair of iterators but this would mean that we can't use std::hash<std::string_view> anymore and I'd like to continue to rely on it as integrating a good and well-optimized hash function in the library is troublesome. We'd also have to see how to take care of the backward-compatibility.

Thibaut

jamescooper-blis commented 4 years ago

Good point about 3rd party hash functions. That's probably a deal breaker unless insert/emplace have additional overloads, or new insert_reversed/emplace_reversed functions, that take the hash code directly. It could get a bit messy. I will experiement with it a bit.