DoumanAsh / xxhash-rust

Rust implementation of xxhash
Boost Software License 1.0
205 stars 20 forks source link

Clarification: is XXH3 a stable hash across architectures? #15

Closed CAD97 closed 2 years ago

CAD97 commented 2 years ago

aHash's hash comparison implies that it isn't.

Currently, my go-to default hashers are aHash when hash stability isn't required, and HighwayHash when stability is desired.

XXH3 appears to be faster than Highway on keys <= 32 bytes while retaining the properties I care about for a default use stable hash function (namely, SMHasher[^1] quality, use of a key, and quality of keying / lack of bad keys). If it is in fact a properly stable and architecture independent hash, I'll be able to seriously consider changing my recommended stable hash default from Highway to XXH3.

[^1]: Well, hash quality in general, not necessarily one specific test.

DoumanAsh commented 2 years ago

Uhm the link you posted says nothing about stability:

Like aHash, t1ha and XXHash are targeted at hashmaps and uses hardware instructions including AES for different platforms rather than having a single standard. Both are fast, but AHash is faster than either one, both with and without AES. This is particularly true of smaller inputs such as integers. T1ha's hashes do not pass the full of the SMHasher test suite. Neither XXHash nor T1ha explicitly claim DOS resistance, but both are keyed hashes, and do not have any obvious way to force collisions. As of this writing there doesn't appear to be a maintained crate implementing the latest version of t1ha.

xxh3 should be stable across architectures. If it is not then it is probably mistake in my port, but as long as you use correct version of xxh algorithm, it should be equal on any platform

CAD97 commented 2 years ago

The implication comes from

uses hardware instructions [...] rather than having a single standard

and that the prior section notes that MurmurHash, CityHash, MetroHash, FarmHash, and HighwayHash

provide consistent output [which] prevents them from taking advantage of different hardware capabilities on different CPUs.

Taken together, the implication is that t1ha and XXHash, like aHash, may provide different results across platforms (e.g. providing a fallback when hardware acceleration for the main algorithm isn't available, or being endianess dependent).


To be clear: I'm approaching this as someone who doesn't know what XXHash is, beyond the fact that it is a high quality hashing algorithm. Maybe aHash is the only hash provider which doesn't guarantee a specific stable result between platforms, but I don't know that, especially coming from aHash and std DefaultHasher, both of which provide a stable hash function for a given run only, and reserve the right to use different algorithms which give different output hashes in the future.

DoumanAsh commented 2 years ago

No, you misunderstand, hardware acceleration is used to achieve better speed, but it doesn't sacrifice output as far as I know. Needs to be tested, but afaik logic of hardware acceleration is the same as plain scalars

P.s. just a side note accuracy loss would only be possible with floats or if you hardware is buggy

DoumanAsh commented 2 years ago

@CAD97 Btw I think if you want to be sure you better test it yourself :)

dkg commented 2 years ago

@DoumanAsh can you help identify what specific parts "need to be tested" on different platforms? or what platforms in particular are of concern? this has come up in discussion about the use of xxhash-rust in sequoia

DoumanAsh commented 2 years ago

@dkg I've answered in gitlab issue, but basically verification was already done by https://gitlab.com/sequoia-pgp/sequoia/-/issues/801#note_817729847 At least for platforms that I would consider relevant to the issue, but I'm not sure if there are more platforms to check to be fair

DoumanAsh commented 2 years ago

Now repo includes cross-platform workflow to verify code on some non-standard platforms for github CI https://github.com/DoumanAsh/xxhash-rust/actions/workflows/cross-rust.yml

So please report new issue if you discover issue on some other platform