armon / libart

Adaptive Radix Trees implemented in C
Other
770 stars 136 forks source link

avoid storing full keys in leafs #10

Open wastl opened 10 years ago

wastl commented 10 years ago

We have an application where we store lots of URIs and map them to integer IDs. Since almost all URIs start with the same prefix, we decided to use the ART trie to do this mapping. Unfortunately, I realised that ART still keeps a complete copy of the key in its leaf nodes - wouldn't it be possible to dynamically reconstruct keys by traversing the trie from the root when needed (e.g. for the iteration)? In our case this would safe tremendous amounts of main memory.

armon commented 10 years ago

I'm not sure if there is a good way to do this. The internal nodes will only store up to 10 bytes of the key, at which point they actually rely on the leaf to store the key.

The problem is this, suppose you insert a few hundred keys with prefix "aaa....aaaa", and then you come along and insert "aaa{36}b.....bbbb". To determine the split point, we need the full key to determine the 37th character diverges. Since you can pick any arbitrary number for that, there will always be some limit to how much can be stored in the art nodes directly. That said, there is probably some rather complex de-duplication logic that can be used, but I don't currently have time to tackle that.

LouHubiao commented 8 years ago

I need insert about 1,000,000,000 node, and the size of ART will like B Tree and the height of tree is much more than B Tree.

KeeProMise commented 2 years ago

We have an application where we store lots of URIs and map them to integer IDs. Since almost all URIs start with the same prefix, we decided to use the ART trie to do this mapping. Unfortunately, I realised that ART still keeps a complete copy of the key in its leaf nodes - wouldn't it be possible to dynamically reconstruct keys by traversing the trie from the root when needed (e.g. for the iteration)? In our case this would safe tremendous amounts of main memory.

How many urls do you store and how much memory is used?