Open xitology opened 5 months ago
I will need to think about what the right answer is here. Before #52193 we indeed used the key
itself here.
But yes an objectid collision does cause a perfect hash collision. I am kinda curious how IdDict survives that.
I will need to think about what the right answer is here. Before #52193 we indeed used the
key
itself here.
Even with pre-#52193 implementation rehashing was ineffective for some key types, e.g. Symbols. That's because hash(s::Symbol)
is defined as objectid(s)
and hash(s, h)
depends only on objectid(s)
:
https://github.com/JuliaLang/julia/blob/master/base/hashing.jl#L38-L40
So looking at IDdict it also "just" uses objectid
and then uses the typical probe + egal check and grow the table on conflict.
For HAMT that statregy wouldn't work. We would probably need to introduce a "PerfectConflict" node with a linear probe,
but as you noted with Symbols this is a general property of using objectid
as source of the hash.
I'm pretty sure it is a bug if objectid has collisions. It seems like something that could be provably made to cause miscompiles.
I'm looking at the implementation of PersistentDict and HAMT in https://github.com/JuliaLang/julia/blob/master/base/hamt.jl. This implementation attempts to avoid hash collisions by using rehashing, but I believe it does not provide the intended effect.
A comment in the code claims Perfect hash collisions should not occur in practice since we perform rehashing after using 55 bits (MAX_SHIFT) of hash. Here's how rehashing is explained in Bagwell's paper: The algorithm requires that the hash can be extended to an arbitrary number of bits. This was accomplished by rehashing the key combined with an integer representing the trie level, zero being the root. Hence if two keys do give the same initial hash then the rehash has a probability of 1 in 232 of a further collision. However the Julia implementation rehashes not the original key, but the previous hash value. If two hashes collide, so will the rehashed hashes.
Here's a test case:
Note that
IdDict
can handle such keys: