Tessil / hat-trie

C++ implementation of a fast and memory efficient HAT-trie
MIT License
795 stars 114 forks source link

How to serialize/deserialize a map? #18

Closed theconnectionist closed 5 years ago

theconnectionist commented 5 years ago

Thanks for the excellent implementation. I'm thinking of using the hat-trie in my project where I need to serialize the built map and later deserialize it. I'm working with tens of millions of strings, so would prefer to deserialize from disk than rebuild the map each time from raw data. What's a good way to serialize/deserialize to/from disk?

Tessil commented 5 years ago

Hello,

Currently the only way to do it is to iterate over the hat-trie and to serialize each string one at the time. On deserialization, each string is read back and inserted through insert.

Using the internals of the structure it's possible to do it more efficiently and I wanted to add efficient serialze/deserialize functions since some time already.

I'll see if I can find some time to work on it in the next few weeks.

theconnectionist commented 5 years ago

Thanks Tessil. I'm using that technique at the moment. When you implement your serialization, recommend using memory mapping.

Tessil commented 5 years ago

Hi @theconnectionist

If you are interested I made a serialization branch that I'll probably merge in a couple of days. I still need to test it a bit more, but it can already give you an overview of the API and of the performances.

theconnectionist commented 5 years ago

Thank you @Tessil. Much appreciated. I'll give it a try.