Currently, the Trie suffers a performance penalty for not handling the case where the hash prefixes of two distinct keys (with distinct hashes) have the same common prefix, and thus, the same path in the Trie. In particular, the Trie has a global constant for MAX_PATH_LEN, and currently, this constant must be set high enough to avoid these collisions in our tests.
For example, on a set of uniformly random 64-bit keys, where we'd expect this situation to be least likely; if we run with 32-bit long paths (using half of the 64 bits we have available), we can accommodate relatively modest input sizes (10k, 20k, ..100k), but at 200k and above, it becomes very likely that collisions will cause the Trie implementation to panic. (These numbers have come from preliminary experimentation, not a proof, nor a comprehensive empirical evaluation).
That we cannot handle collisions gracefully means that we cannot handle them efficiently: We are currently using very long paths in the Trie, probably much longer than we actually need. By comparison, the native Rust HashMap implementation uses "robin hood" hashing, which is a clever way of handling collisions in the table.
I've addressed this issue, as well as the problem of imprecise dependencies in the trie (aka, skiplist) logic, with commit 3af267833884435aa718a2c4efdb25956ca020a5
Currently, the Trie suffers a performance penalty for not handling the case where the hash prefixes of two distinct keys (with distinct hashes) have the same common prefix, and thus, the same path in the Trie. In particular, the Trie has a global constant for
MAX_PATH_LEN
, and currently, this constant must be set high enough to avoid these collisions in our tests.For example, on a set of uniformly random 64-bit keys, where we'd expect this situation to be least likely; if we run with 32-bit long paths (using half of the 64 bits we have available), we can accommodate relatively modest input sizes (10k, 20k, ..100k), but at 200k and above, it becomes very likely that collisions will cause the Trie implementation to panic. (These numbers have come from preliminary experimentation, not a proof, nor a comprehensive empirical evaluation).
That we cannot handle collisions gracefully means that we cannot handle them efficiently: We are currently using very long paths in the Trie, probably much longer than we actually need. By comparison, the native Rust
HashMap
implementation uses "robin hood" hashing, which is a clever way of handling collisions in the table.