cuplv / iodyn.rust

Collections Library for Adapton, in Rust
13 stars 0 forks source link

Incremental, Gauged Trie: Handle collisions of hash-prefixes gracefully, and efficiently #1

Closed matthewhammer closed 7 years ago

matthewhammer commented 7 years ago

Currently, the Trie suffers a performance penalty for not handling the case where the hash prefixes of two distinct keys (with distinct hashes) have the same common prefix, and thus, the same path in the Trie. In particular, the Trie has a global constant for MAX_PATH_LEN, and currently, this constant must be set high enough to avoid these collisions in our tests.

For example, on a set of uniformly random 64-bit keys, where we'd expect this situation to be least likely; if we run with 32-bit long paths (using half of the 64 bits we have available), we can accommodate relatively modest input sizes (10k, 20k, ..100k), but at 200k and above, it becomes very likely that collisions will cause the Trie implementation to panic. (These numbers have come from preliminary experimentation, not a proof, nor a comprehensive empirical evaluation).

That we cannot handle collisions gracefully means that we cannot handle them efficiently: We are currently using very long paths in the Trie, probably much longer than we actually need. By comparison, the native Rust HashMap implementation uses "robin hood" hashing, which is a clever way of handling collisions in the table.

matthewhammer commented 7 years ago

I've addressed this issue, as well as the problem of imprecise dependencies in the trie (aka, skiplist) logic, with commit 3af267833884435aa718a2c4efdb25956ca020a5