Cyan4973 / xxHash

Extremely fast non-cryptographic hash algorithm
http://www.xxhash.com/
Other
8.96k stars 768 forks source link

Why only xxHash SSE version is tested in benchmark section? #445

Closed VasiliPupkin256 closed 4 years ago

VasiliPupkin256 commented 4 years ago

HighwayHash is specifically optimized for SSE4.1 and AVX2, why you even bother to test its pure C implementation? Without a further explanation why an arbitrary SSE2 limit is chosen it looks like an intentional downgrade of a cryptographically strong competitor which otherwise would show around 12 GB/s bandwidth.

IMO the benchmark table should also include AVX2 versions of xxHash and HighwayHash for the sake of fairness.

Also the SipTreeHash version of the SipHash can be chosen for the AVX2 benchmark, which is basically the same as migration from XXH32 to incompatible but parallel XXH3.

easyaspi314 commented 4 years ago

Well it was intended to be a "base feature" thing.

SSE2 is available on every x86_64 CPU and basically every i686 CPU that is still running.

AVX2 (and even SSE4.1 except on OS X) is not safe to use without a check.

VasiliPupkin256 commented 4 years ago

I've thought of this reference. Hence I've wrote that this limit should be explained in the benchmark section and the current test of the compatible but slow HighwayHash implementation looks like a taunt, it should be removed from the table, it wasn't projected for this benchmark, just like XXH3 is not intended to run on 8 bit ATmega microcontroller.

I still think that if is far-fetched. SSE4.1 is around for decade and most users of the xxHash have a compatible CPU. So.. why not include it, the SSE2 requirement is still noted in brackets anyway.

VasiliPupkin256 commented 4 years ago

A typical modern machine similar to the one used in the test has memory bandwidth around 30 GB/s, SSD bandwidth around 5 GB/s ( probably 6-7 GB/s for the next ssd generation ), the network 1 GB/s on 10Gb ethernet and an AVX2 compatible multicore CPU.

10GB/s is an important landmark because it already covers the SSD. So the HighwayHash or SipTreeHash is enough to hash single SSD full speed utilising single core and get cryptographically secure results. xxHash family only become handy when you need to run hash sequentially in a poorly parallelised algorithm or when the data is generated on the fly or you want to be energy efficient... hash on mobile phones.. etc. Being SSE2 compatible is not an important requirement at all.

easyaspi314 commented 4 years ago

the HighwayHash or SipTreeHash is enough to hash single SSD full speed utilising single core and get cryptographically secure results.

I don't believe HighwayHash has been proven to be cryptographically secure, and SipTreeHash is likely secure but I don't believe it has the same guarantees as SipHash.

Besides, if you wanted cryptographically secure hashes, you would use something like SHA512 or BLAKE2.

xxHash family only become handy when you need to run hash sequentially in a poorly parallelised algorithm or when the data is generated on the fly or you want to be energy efficient...hash on mobile phones.. etc. Being SSE2 compatible is not an important requirement at all.

  1. I believe you will find that most programs are single threaded. Compilers like GCC and Clang come to mind — they are single threaded since the traditional pattern is to let make handle the threading. Threading is infamous for being easy to mess up and many developers avoid it.
  2. In hash tables, data is often generated on-the-fly. That is what xxHash was often used for.
  3. XXH3 uses a minimalist design, designed to be reasonably fast on most of the CPUs that run XXH32 (the only additional requirement is a long multiply) while also scaling up nicely to modern hardware. This has the side effect of being very energy efficient.
  4. The consumer market is a very important target, and for quite a few years, consumers have been using smartphones more than desktops. As a matter of fact, a lot of my coding is done from my phone in Termux.
  5. SSE2 happens to be enough to accelerate XXH3, so we target it.

Either way, @Cyan4973 do you mind adding the SSE4.1 and AVX2 implementations to the benchmark? I think this is a fair point. The benchmark graphs need to be updated anyways.

VasiliPupkin256 commented 4 years ago

No hash is proven to be secure even the SHA512. HighwayHash is probably less secure than the SipHash, but the SipTreeHash is a Merkle Tree on SipHash so it should be as secure as SipHash if they didn't mess an implementation.

I agree with all the statements above, xxHash family hashes are useful where you need more bandwidth or being energy efficient. But in case of a parallel benchmark which is targeted on x86 CPU, cryptographically secure parallel implementations are also very interesting especially if they can achieve the speed of SSD. So yes... @Cyan4973 please

Cyan4973 commented 4 years ago

I'm happy to remove HighwayHash from the homepage's benchmark table if it can help

VasiliPupkin256 commented 4 years ago

No answer to the question though. Why only xxHash SSE version is tested in benchmark section? SipHash can be run in merkle tree parallel implementation calculate 4-8 hashes at a time showing bandwidth of around 10GB/s. Not binary compatible but parallel secure and fast. Even the MD5/SHA1 can achieve the speeds of 3 GB/s if run on SSE2 in parallel splitting the input in buckets the same way you do in XXH3 to make it SSE friendly over XXH32.

The idea is simple. XXH32 is not parallel friendly - making a binary incompatible put parallel hash XXH3 and test it on SSE2. SipHash is not parallel friendly - no... we wouldn't test SipTreeHash parallel friendly implementation of it or any other parallel friendly hashes, sorry.

Cyan4973 commented 4 years ago

Why only xxHash SSE version is tested in benchmark section?

As explained at the top of the table :

The open source benchmark program is compiled with clang v10.0 using -O3 flag.

This is the default configuration of gcc and clang for x64. This configuration enables SSE2, but not much more.

VasiliPupkin256 commented 4 years ago

You are asking for SSE2 implementation of secure hashes? Here it is https://github.com/krisprice/simd_md5/blob/master/simd_md5/md5_sse.c Most of the secure hashes can be roughly parallelized the same way, just calculate 4-8 instances of it in SSE wide register splitting input in 64 byte buckets and then combine results. Isn't it the same idea behind XXH3?

Cyan4973 commented 4 years ago

https://github.com/Cyan4973/xxHash/issues/257#issuecomment-667649862

VasiliPupkin256 commented 4 years ago

In context of XXH32 it is ok. But with introduction of XXH3 which is inherently targeted on SSE2 it looks like a cherry picking of slower scalar competitors. Hence the question I rise in this issue.

Don't you see, XXH3 is targeting the memory bandwidth. It is... I don't know. A comparison of an aircraft carrier to a tank.