Cyan4973 / xxHash

Extremely fast non-cryptographic hash algorithm
http://www.xxhash.com/
Other
9.21k stars 778 forks source link
c dispersion hash hash-checksum hash-functions smhasher xxhash

xxHash - Extremely fast hash algorithm

xxHash is an Extremely fast Hash algorithm, processing at RAM speed limits. Code is highly portable, and produces hashes identical across all platforms (little / big endian). The library includes the following algorithms :

All variants successfully complete the SMHasher test suite which evaluates the quality of hash functions (collision, dispersion and randomness). Additional tests, which evaluate more thoroughly speed and collision properties of 64-bit hashes, are also provided.

Branch Status
release Build Status
dev Build Status

Benchmarks

The benchmarked reference system uses an Intel i7-9700K cpu, and runs Ubuntu x64 20.04. The open source benchmark program is compiled with clang v10.0 using -O3 flag.

Hash Name Width Bandwidth (GB/s) Small Data Velocity Quality Comment
XXH3 (SSE2) 64 31.5 GB/s 133.1 10
XXH128 (SSE2) 128 29.6 GB/s 118.1 10
RAM sequential read N/A 28.0 GB/s N/A N/A for reference
City64 64 22.0 GB/s 76.6 10
T1ha2 64 22.0 GB/s 99.0 9 Slightly worse collisions
City128 128 21.7 GB/s 57.7 10
XXH64 64 19.4 GB/s 71.0 10
SpookyHash 64 19.3 GB/s 53.2 10
Mum 64 18.0 GB/s 67.0 9 Slightly worse collisions
XXH32 32 9.7 GB/s 71.9 10
City32 32 9.1 GB/s 66.0 10
Murmur3 32 3.9 GB/s 56.1 10
SipHash 64 3.0 GB/s 43.2 10
FNV64 64 1.2 GB/s 62.7 5 Poor avalanche properties
Blake2 256 1.1 GB/s 5.1 10 Cryptographic
SHA1 160 0.8 GB/s 5.6 10 Cryptographic but broken
MD5 128 0.6 GB/s 7.8 10 Cryptographic but broken

note 1: Small data velocity is a rough evaluation of algorithm's efficiency on small data. For more detailed analysis, please refer to next paragraph.

note 2: some algorithms feature faster than RAM speed. In which case, they can only reach their full speed potential when input is already in CPU cache (L3 or better). Otherwise, they max out on RAM speed limit.

Small data

Performance on large data is only one part of the picture. Hashing is also very useful in constructions like hash tables and bloom filters. In these use cases, it's frequent to hash a lot of small data (starting at a few bytes). Algorithm's performance can be very different for such scenarios, since parts of the algorithm, such as initialization or finalization, become fixed cost. The impact of branch mis-prediction also becomes much more present.

XXH3 has been designed for excellent performance on both long and small inputs, which can be observed in the following graph:

XXH3, latency, random size

For a more detailed analysis, please visit the wiki : https://github.com/Cyan4973/xxHash/wiki/Performance-comparison#benchmarks-concentrating-on-small-data-

Quality

Speed is not the only property that matters. Produced hash values must respect excellent dispersion and randomness properties, so that any sub-section of it can be used to maximally spread out a table or index, as well as reduce the amount of collisions to the minimal theoretical level, following the birthday paradox.

xxHash has been tested with Austin Appleby's excellent SMHasher test suite, and passes all tests, ensuring reasonable quality levels. It also passes extended tests from newer forks of SMHasher, featuring additional scenarios and conditions.

Finally, xxHash provides its own massive collision tester, able to generate and compare billions of hashes to test the limits of 64-bit hash algorithms. On this front too, xxHash features good results, in line with the birthday paradox. A more detailed analysis is documented in the wiki.

Build modifiers

The following macros can be set at compilation time to modify libxxhash's behavior. They are generally disabled by default.

Binary size control

Build modifiers specific for XXH3

Makefile variables

When compiling the Command Line Interface xxhsum using make, the following environment variables can also be set :

Building xxHash - Using vcpkg

You can download and install xxHash using the vcpkg dependency manager:

git clone https://github.com/Microsoft/vcpkg.git
cd vcpkg
./bootstrap-vcpkg.sh
./vcpkg integrate install
./vcpkg install xxhash

The xxHash port in vcpkg is kept up to date by Microsoft team members and community contributors. If the version is out of date, please create an issue or pull request on the vcpkg repository.

Example

The simplest example calls xxhash 64-bit variant as a one-shot function generating a hash value from a single buffer, and invoked from a C/C++ program:

#include "xxhash.h"

    (...)
    XXH64_hash_t hash = XXH64(buffer, size, seed);
}

Streaming variant is more involved, but makes it possible to provide data incrementally:

#include "stdlib.h"   /* abort() */
#include "xxhash.h"

XXH64_hash_t calcul_hash_streaming(FileHandler fh)
{
    /* create a hash state */
    XXH64_state_t* const state = XXH64_createState();
    if (state==NULL) abort();

    size_t const bufferSize = SOME_SIZE;
    void* const buffer = malloc(bufferSize);
    if (buffer==NULL) abort();

    /* Initialize state with selected seed */
    XXH64_hash_t const seed = 0;   /* or any other value */
    if (XXH64_reset(state, seed) == XXH_ERROR) abort();

    /* Feed the state with input data, any size, any number of times */
    (...)
    while ( /* some data left */ ) {
        size_t const length = get_more_data(buffer, bufferSize, fh);
        if (XXH64_update(state, buffer, length) == XXH_ERROR) abort();
        (...)
    }
    (...)

    /* Produce the final hash value */
    XXH64_hash_t const hash = XXH64_digest(state);

    /* State could be re-used; but in this example, it is simply freed  */
    free(buffer);
    XXH64_freeState(state);

    return hash;
}

License

The library files xxhash.c and xxhash.h are BSD licensed. The utility xxhsum is GPL licensed.

Other programming languages

Beyond the C reference version, xxHash is also available from many different programming languages, thanks to great contributors. They are listed here.

Packaging status

Many distributions bundle a package manager which allows easy xxhash installation as both a libxxhash library and xxhsum command line interface.

Packaging status

Special Thanks