Closed CAD97 closed 2 years ago
Uhm the link you posted says nothing about stability:
Like aHash, t1ha and XXHash are targeted at hashmaps and uses hardware instructions including AES for different platforms rather than having a single standard. Both are fast, but AHash is faster than either one, both with and without AES. This is particularly true of smaller inputs such as integers. T1ha's hashes do not pass the full of the SMHasher test suite. Neither XXHash nor T1ha explicitly claim DOS resistance, but both are keyed hashes, and do not have any obvious way to force collisions. As of this writing there doesn't appear to be a maintained crate implementing the latest version of t1ha.
xxh3 should be stable across architectures. If it is not then it is probably mistake in my port, but as long as you use correct version of xxh algorithm, it should be equal on any platform
The implication comes from
uses hardware instructions [...] rather than having a single standard
and that the prior section notes that MurmurHash, CityHash, MetroHash, FarmHash, and HighwayHash
provide consistent output [which] prevents them from taking advantage of different hardware capabilities on different CPUs.
Taken together, the implication is that t1ha and XXHash, like aHash, may provide different results across platforms (e.g. providing a fallback when hardware acceleration for the main algorithm isn't available, or being endianess dependent).
To be clear: I'm approaching this as someone who doesn't know what XXHash is, beyond the fact that it is a high quality hashing algorithm. Maybe aHash is the only hash provider which doesn't guarantee a specific stable result between platforms, but I don't know that, especially coming from aHash and std DefaultHasher, both of which provide a stable hash function for a given run only, and reserve the right to use different algorithms which give different output hashes in the future.
No, you misunderstand, hardware acceleration is used to achieve better speed, but it doesn't sacrifice output as far as I know. Needs to be tested, but afaik logic of hardware acceleration is the same as plain scalars
P.s. just a side note accuracy loss would only be possible with floats or if you hardware is buggy
@CAD97 Btw I think if you want to be sure you better test it yourself :)
@DoumanAsh can you help identify what specific parts "need to be tested" on different platforms? or what platforms in particular are of concern? this has come up in discussion about the use of xxhash-rust in sequoia
@dkg I've answered in gitlab issue, but basically verification was already done by https://gitlab.com/sequoia-pgp/sequoia/-/issues/801#note_817729847 At least for platforms that I would consider relevant to the issue, but I'm not sure if there are more platforms to check to be fair
Now repo includes cross-platform workflow to verify code on some non-standard platforms for github CI https://github.com/DoumanAsh/xxhash-rust/actions/workflows/cross-rust.yml
So please report new issue if you discover issue on some other platform
aHash's hash comparison implies that it isn't.
Currently, my go-to default hashers are aHash when hash stability isn't required, and HighwayHash when stability is desired.
XXH3 appears to be faster than Highway on keys <= 32 bytes while retaining the properties I care about for a default use stable hash function (namely, SMHasher[^1] quality, use of a key, and quality of keying / lack of bad keys). If it is in fact a properly stable and architecture independent hash, I'll be able to seriously consider changing my recommended stable hash default from Highway to XXH3.
[^1]: Well, hash quality in general, not necessarily one specific test.