Open mikemccand opened 1 year ago
@dweiss will probably say more than me about the awesome BitMixer#PHI_C64 constant!
I borrowed that constant in BitMixer from Sebastiano Vigna, I believe. Here is a nice overview of its origin/ rationale:
I can only confirm that a good hash redistribution function, along with linear probing, give very good results in most hash/index redistribution problems I've seen.
I opened a PR to make use of this constant : #12716
Also, I was thinking if this constant could be utilised in other hash function implementations as well in the codebase?
If you'd like to do so, I'd suggest moving such a "scattering remix" utility to a separate class and reusing it elsewhere, much like here: https://github.com/carrotsearch/hppc/blob/master/hppc/src/main/java/com/carrotsearch/hppc/BitMixer.java#L96-L99
Makes it easier to change the remixing strategy everywhere at once.
Description
Spinoff from this cool comment, thanks to hashing guru @bruno-roustant:
This is a simple change, we just need to test on some real FST building cases to confirm good mixing "in practice". The new
IndexToFST
tool inluceneutil
is helpful for this.