Cyan4973 / xxHash

Extremely fast non-cryptographic hash algorithm
http://www.xxhash.com/
Other
9.14k stars 777 forks source link

Clarification: XXH3_64bits_withSecret/XXH3_128bits_withSecret for hash tables? #294

Closed koraa closed 4 years ago

koraa commented 4 years ago

I am using the functions (in the title) for a hash table; the secret is 64bits long and generated using a proper CSPRNG (/dev/urandom) and is kept secret. What level of protection does that offer against the usual hash based hash table attacks?

Thanks for your great work. I've been using xxhash for years to impress my colleagues and do performance optimization. It has never disappointed me :)

easyaspi314 commented 4 years ago

Well currently, XXH3's 64-bit variant has a known seed-dependent collision problem (#261), which requires knowing 8 consecutive bytes of the secret. This is a rather large problem with the unseeded variant as the "secret" is public, but pretty minor for the seeded one.

The 128bits variant has been patched, but we haven't worked out a solution yet for the 64bits variant.

There is a slight performance drop on the seeded variant and the 128bits variant, but they are highly unlikely to cause a collision.

koraa commented 4 years ago

Well currently, XXH3's 64-bit variant has a known seed-dependent collision problem (#261), which requires knowing 8 consecutive bytes of the secret. This is a rather large problem with the unseeded variant as the "secret" is public, but pretty minor for the seeded one.

I mean, thanks for the info…8 bit would be the entirety of my secret and…I do in fact intent to keep my secret…well secret…maybe I should upgrade to an 128bit secret anyways…

How would you compare xxh3 to…let's say…siphash when used with a secret?

easyaspi314 commented 4 years ago

I mean I'm not that good at cryptography, but I'd say XXH3 is in general more viable for a hash table.

However, it is important to note that SipHash and XXH3 are different types of functions.

SipHash XXH3
Crypto pseudorandom keyed MAC Non-crypto hash function
Aims to be unpredictable, not collision resistant Aims to reasonably collision resistant but not cryptographic
Rather slow, esp. on short keys Multiple times faster on short and long keys
Very small binary size Rather large binary size
64-bit only with weaker 32-bit variant Full strength and speed on 32-bit and 64-bit
Scalar design with mediocre SSE3 implementation Optimized for both scalar and many SIMD implementations

SipHash is somewhat overrated IMO.

It is very good for what it does (generate a strong random output), and it is written by respected cryptographers who know their shit.

However, it is too slow to compete with non-cryptographic hash functions, and it lacks the collision resistance guarantee of stronger cryptographic hash functions.

It also has a 32 byte state (like XXH64), but each round processes only 8 bytes at a time.

As a matter of fact, there are cryptographic hashes like the BLAKE family which are fast and strong.

Besides, the main reason for the "hashpocalypse" was not that non-cryptographic is too weak for a hash table. It was because of a fundamental flaw in the Murmur algorithm which resulted in an easy seed-independent collision in the popular MurmurHash and CityHash (which used Murmur on short keys) functions.

koraa commented 4 years ago

Thank you!