Open avaneev opened 2 years ago
Could you point at the Perlin Noise test you refer to ?
The only one I could find in rurban's smhasher
is : https://github.com/rurban/smhasher/commit/64b7d10ea92701e707ecba5f6b0bacbdb3d98b8a
I'm not sure but this issue may suggest the following new result of 'PerlinNoiseAV' test:
https://github.com/rurban/smhasher/commit/9ca488454eda4dae9ac795147696135e5cdf7dbe
In these new results, 'Perlin Noise'
test is traditional perlin noise test.
https://github.com/rurban/smhasher/blob/9ca488454eda4dae9ac795147696135e5cdf7dbe/doc/xxh128.txt#L816
But 'PerlinNoise'
test contains new PerlinNoiseAV (AV variant
) test.
https://github.com/rurban/smhasher/blob/9ca488454eda4dae9ac795147696135e5cdf7dbe/doc/xxh128.txt#L859
I dug into this and found a few things:
Testing collisions (128-bit) - Expected 0.0, actual 24116 (1711423593602593105674976100352.00x) (24116)
Testing collisions (high 64-bit) - Expected 0.0, actual 24116 (92776458911.34x) (24116) !!!!!
Testing collisions (high 32-bit) - Expected 1116.2, actual 25260 (22.63x) (24144) !!!!!
Testing collisions (high 25-37 bits) - Worst is 37 bits: 24162/34 (692.56x) !!!!!
Testing collisions (low 64-bit) - Expected 0.0, actual 24116 (92776458911.34x) (24116) !!!!!
Testing collisions (low 32-bit) - Expected 1116.2, actual 25187 (22.57x) (24071) !!!!!
Testing collisions (low 25-37 bits) - Worst is 37 bits: 24153/34 (692.30x) !!!!!
hash=11111111111101000100000010101111001000111110101100110000001100001000010010111000001100101010110000111110000101100001111101110011
seed=00000000000000000000000100010000 key=00000001000000010000000000010000
seed=00000000000000000000000100010001 key=00000001000000010000000000010001
seed=00000001000000000000000100010000 key=00000000000000010000000000010000
seed=00000001000000000000000100010001 key=00000000000000010000000000010001
seed = XXH3_mul128_fold64(seed, PRIME64_3);
With this line added at the beginning of XXH3_len_9to16_128b, PerlinNoiseAV passes:
Testing AV variant, 128 count with 4 spacing, 4-12:
Testing collisions (128-bit) - Expected 0.0, actual 0 (0.00x)
Testing collisions (high 64-bit) - Expected 0.0, actual 0 (0.00x)
Testing collisions (high 32-bit) - Expected 1116.2, actual 1137 (1.02x) (21)
Testing collisions (high 25-37 bits) - Worst is 37 bits: 48/34 (1.38x)
Testing collisions (low 64-bit) - Expected 0.0, actual 0 (0.00x)
Testing collisions (low 32-bit) - Expected 1116.2, actual 1109 (0.99x) (-7)
Testing collisions (low 25-37 bits) - Worst is 35 bits: 140/139 (1.00x)
XXH3_mul128_fold64(seed, PRIME64_3)
all way up to XXH128 without (apparently) impacting performance on small messages or performance on SMHasher (apart from fixing this bug, of course). And, of course, callers can simply do this on their side without changing XXH128 at all.Yes, a reasonable fix (wyhash did that as well recently) is to add an additional "input seed" shuffling round.
Sorry to write this directly as I'm an "interested person" having my own hash functions, but recently the claim "xxhash passes all SMHasher tests" became imprecise: I've invented a new Perlin Noise test that finds "holes" in several hashes, not just xxhash. Maybe you can fix this as xxhash is too popular to miss that fact. I'm not expecting to gain anything from my post; my
komihash
hash function is relatively new, so needs a lot more time to gain traction in this very competitive "hash function" field.Maybe you can also add
komihash
to your tests, it's probably not as fast as xxhash3, though, but useful as a comparison to the "state-of-the-art" hash functions. (wyhash
comparison would be great as well)https://github.com/avaneev/komihash https://github.com/rurban/smhasher