Tessil / robin-map

C++ implementation of a fast hash map and hash set using robin hood hashing
MIT License
1.28k stars 118 forks source link

Identity hash function by default on GCC #73

Open stgatilov opened 9 months ago

stgatilov commented 9 months ago

I ran the following code built with GCC in Linux VM:

#include <stdint.h>
#include <stdio.h>
#include <time.h>
#include <tsl/robin_set.h>

int main() {
    for (uint64_t size = 1 << 20; size <= 1 << 25; size <<= 1) {
        for (int mode = 0; mode < 2; mode++) {
            tsl::robin_set<uint64_t> x;

            uint64_t startTime = clock();
            for (uint64_t i = 0; i < size; i++)
                x.insert(i * (mode == 0 ? 0xDEADBEEF : 1 << 20));
            uint64_t endTime = clock();

            double deltaTime = double(endTime - startTime) / CLOCKS_PER_SEC;
            uint64_t sum = 0;
            for (uint64_t val : x)
                sum += val;

            printf("N = %llu %c:    time = %0.3lf    chk = %llu\n", size, (mode ? 'B' : 'M'), deltaTime, sum);
        }
    }
    return 0;
}

and I got:

N = 1048576 M:    time = 0.043    chk = 6257894696195457024
N = 1048576 B:    time = 8.280    chk = 576460202547609600
N = 2097152 M:    time = 0.115    chk = 6588752116096958464
N = 2097152 B:    time = 19.110    chk = 2305841909702066176
N = 4194304 M:    time = 0.169    chk = 7916099200727646208
N = 4194304 B:    time = 35.500    chk = 9223369837831520256
N = 8388608 M:    time = 0.354    chk = 13233322349299761152
Killed

So I guess inserting integers divisible by 2^20 takes quadratic time. Moreover, trying to insert 16M values results in a crash. Most likely because std::hash<uint64_t>(x) = x on GCC.

Note that I used default settings and got no warnings! Awful hash function by default is rather critical issue for people who don't know much about hashing (and would probably do worse trying to implement their own hash function or hash table). And given that TSL interface is very STL-like, I think that's the audience it is targeted at.


A proper hash function usually contains three parts:

  1. Combining: getting one integer out of many values/tuples/sequences/etc.
  2. Finalizing: doing some transformation for good statistical properties after step 3.
  3. Reduction: reducing the domain from something like whole range of uint64_t to an index in hash table.

As usual, C++ standard is not precise enough, and STL is not cross-platform. On MSVC, std::hash performs steps 1 and 2, while std::unordered_set only does step 3. On GCC, std::hash only performs step 1, while std::unordered_set does steps 2 and 3. It means that if you use std::hash directly, then you should run your own hash finalizer. TSL hash table only does step 3, but uses std::hash, meaning that the crucial step 2 is missed.

stgatilov commented 9 months ago

I think a good solution would be to define custom tsl::hash<T>. Depending on __GLIBCXX__ (perhaps also check what's going on in libc++), it should either redirect to std::hash<T>, or call std::hash<T> and run its output through some hash finalizer. Then replace std::hash with tsl::hash as default template argument in all class declarations.

GCC users will get proper hash function by default. No changes for non-GCC users. And hash function will remain fully tweakable.

Tessil commented 7 months ago

Yes, at the time of creating the map, I used the default std::hash as I didn't wanted to maintain my own hash function or have the repository depends on an external dependency, and adapted the hash in my projects depending on the usage.

Your solution would be a good compromise, I'll check to work on it when I have more time but don't hesitate to create a PR in the meantime if you have the time.

stgatilov commented 6 months ago

I started working on PR. Thus far I have added a unit test checking that default hash function on uint64_t depends on upper half of the integer.

I decided to check GCC / libstd-c++ to see which hash finalizer is used there. It turns out that no hash finalizer is used at all! They use prime size for hash table and simply hope that "modulo size" reduction is good enough.

For example, this code takes 8 seconds for 50K elements on my machine:

    std::unordered_set<uint64_t> set;
    set.reserve(NumElements);
    size_t p = set.bucket_count();
    for (uint64_t i = 0; i < NumElements; i++)
        set.insert(i * p);

The same happens on Clang / libc++.

Clearly, this issue is very unlikely to happen unintentionally if prime size is used.

stgatilov commented 6 months ago

In the PR, I decided to use hash finalizers from MurmurHash2. They are rather well-known and do very few operations. Although 64-bit case looks weird: it seems that output bits in [17..47) range don't depend on the higher bits in the same range.

The newer MurmurHash3 uses one more shift & multiply round in finalizer. Maybe it should be used in 64-bit case? More specifically, the fmix64 function.