Tessil / sparse-map

C++ implementation of a memory efficient hash map and hash set
MIT License
334 stars 36 forks source link

Make prime growth policy arrays class members #8

Closed cmakshin closed 4 years ago

cmakshin commented 5 years ago

Namespace-scope const-qualified variables have internal linkage, wasting both storage and CPU cache space due to each translation unit having its own copies of these variables.

For example, with unmodified code "ShowGlobals" utility from https://www.chromium.org/developers/windows-binary-sizes shows:

#Dups   DupSize   Size  Section Symbol name                PDB name
199     63680        0  0       tsl::sh::detail::PRIMES    xyz.pdb
199     63680        0  0       tsl::sh::detail::MOD_PRIME xyz.pdb

and the size of the final binary is 23'767'552 bytes.

With this change "ShowGlobals" doesn't show these arrays (prime growth policy is not used in that project so I don't know why compiler and/or linker didn't remove them) and the binary is reduced to 23'619'072 bytes.

cmakshin commented 5 years ago

GCC 8.3 and MSVC 2017 didn't have any complaints when I checked this change. I'll need some time to fix it for, at least, Clang and MSVC 2015.

Tessil commented 5 years ago

Hi,

Thank you for the report. I effectively didn't pay attention to that, I have to do some tests with -flto.

Otherwise another way to do it which works with GCC 4.8 is the following:

class prime_growth_policy {
public:
    explicit prime_growth_policy(std::size_t& min_bucket_count_in_out) {
        auto it_prime = std::lower_bound(primes().begin(), 
                                         primes().end(), min_bucket_count_in_out);
        if(it_prime == primes().end()) {
            throw std::length_error("The hash table exceeds its maxmimum size.");
        }

        m_iprime = static_cast<unsigned int>(std::distance(primes().begin(), it_prime));
        if(min_bucket_count_in_out > 0) {
            min_bucket_count_in_out = *it_prime;
        }
        else {
            min_bucket_count_in_out = 0;
        }
    }

    std::size_t bucket_for_hash(std::size_t hash) const noexcept {
        return mod_prime()[m_iprime](hash);
    }

    std::size_t next_bucket_count() const {
        if(m_iprime + 1 >= primes().size()) {
            throw std::length_error("The hash table exceeds its maxmimum size.");
        }

        return primes()[m_iprime + 1];
    }   

    std::size_t max_bucket_count() const {
        return primes().back();
    }

    void clear() noexcept {
        m_iprime = 0;
    }

private:
    static const std::array<std::size_t, 40>& primes() {
        static const std::array<std::size_t, 40> PRIMES = {{
            1ul, 5ul, 17ul, 29ul, 37ul, 53ul, 67ul, 79ul, 97ul, 131ul, 193ul, 257ul, 389ul, 521ul, 769ul, 1031ul, 
            1543ul, 2053ul, 3079ul, 6151ul, 12289ul, 24593ul, 49157ul, 98317ul, 196613ul, 393241ul, 786433ul, 
            1572869ul, 3145739ul, 6291469ul, 12582917ul, 25165843ul, 50331653ul, 100663319ul, 201326611ul, 
            402653189ul, 805306457ul, 1610612741ul, 3221225473ul, 4294967291ul
        }};

        static_assert(std::numeric_limits<decltype(m_iprime)>::max() >= PRIMES.size(), 
                      "The type of m_iprime is not big enough.");

        return PRIMES;
    }

    static const std::array<std::size_t(*)(std::size_t), 40>& mod_prime() {
        // MOD_PRIME[iprime](hash) returns hash % PRIMES[iprime]. This table allows for faster modulo as the
        // compiler can optimize the modulo code better with a constant known at the compilation.
        static const std::array<std::size_t(*)(std::size_t), 40> MOD_PRIME = {{ 
            &mod<0>, &mod<1>, &mod<2>, &mod<3>, &mod<4>, &mod<5>, &mod<6>, &mod<7>, &mod<8>, &mod<9>, &mod<10>, 
            &mod<11>, &mod<12>, &mod<13>, &mod<14>, &mod<15>, &mod<16>, &mod<17>, &mod<18>, &mod<19>, &mod<20>, 
            &mod<21>, &mod<22>, &mod<23>, &mod<24>, &mod<25>, &mod<26>, &mod<27>, &mod<28>, &mod<29>, &mod<30>, 
            &mod<31>, &mod<32>, &mod<33>, &mod<34>, &mod<35>, &mod<36>, &mod<37> , &mod<38>, &mod<39>
        }};

        return MOD_PRIME;
    }

    template<unsigned int IPrime>
    static std::size_t mod(std::size_t hash) { 
        return hash % primes()[IPrime]; 
    }

private:
    unsigned int m_iprime;
}; 
cmakshin commented 5 years ago

Your solution is nice but I'm afraid returning references to local statics may hurt compile-time optimizations.

Tessil commented 5 years ago

From the tests I did with Clang and GCC, there was no difference. I still have to check with MSVC.

cmakshin commented 5 years ago

As a side note: after discovering this problem I got an idea that it would be nice to use the same growth policy file for all relevant containers. That would simplify both maintenance (made some improvements in one project — just copy the file to other repositories) and compiler's life in projects which use several containers (for example, sparse_map as a space-efficient long-term storage and robin_map for tasks where speed is more important).

Tessil commented 5 years ago

Yes, I pondered previously some way to share part of the code of the different tsl libraries as they have some small portions of similar code. But in the end I want to keep each library in its own separate self-sufficient repository instead of putting everything in a common tsl repository. I could have a small common repository as sub-module of each lib but I prefer to keep it as it is for now. But yeah the current solution mean that the prime array is duplicated when using two libs and prime_growth_policy.