JuliaLang / julia

The Julia Programming Language
https://julialang.org/
MIT License
45.73k stars 5.48k forks source link

Hash Tuple #37073

Closed guberger closed 3 years ago

guberger commented 4 years ago

I noticed that the function for hashing Tuple's is not tail-recursive:

hash(::Tuple{}, h::UInt) = h + tuplehash_seed
hash(t::Tuple, h::UInt) = hash(t[1], hash(tail(t), h))

Here is one alternative (not using recursive functions) producing the same hash:

Base.hash(x::Tuple{}, h::UInt) = h + Base.tuplehash_seed
@generated function Base.hash(x::NTuple{N}, h::UInt) where N
    quote
        h += Base.tuplehash_seed
        @nexprs $N i -> h = hash(x[$N-i+1], h)
    end
end

Tested on Tuple{Int,Int,Int}, it seems ~2x faster.

stevengj commented 4 years ago

Tested on Tuple{Int,Int,Int}, it seems ~2x faster.

That's surprising to me — for such a small tuple, I would think the hash function would be completely inlined and hence equivalent to your @generated version. Examining, @code_llvm hash((1,2,3), UInt64(0)) it indeed seems to be inlined.

guberger commented 4 years ago

I claim this from this tests (of course there might be other reasons why this seems faster with MyTuple but I don't see them...):

using Base.Cartesian

struct MyTuple{N,S<:NTuple{N}}
    T::S
end

Base.hash(x::MyTuple{0,S}, h::UInt) where S = h + Base.tuplehash_seed
@generated function Base.hash(x::MyTuple{N}, h::UInt) where N
    quote
        h += Base.tuplehash_seed
        @nexprs $N i -> h = hash(x.T[$N-i+1], h)
    end
end

function TestHashTuple(n)
    D1 = [(1,2,i) for i = 1:n]
    D2 = [MyTuple((1,2,i)) for i = 1:n]
    for (d1, d2) in zip(D1, D2)
        @assert hash(d1) == hash(d2)
    end
    @time for d in D1
        hash(d)
    end
    @time for d in D2
        hash(d)
    end
end
blegat commented 4 years ago

I also get a 2x speedup on my maching with the example

_hash(x::Tuple{}, h::UInt) = h + Base.tuplehash_seed
@generated function _hash(x::NTuple{N}, h::UInt) where N
    quote
        h += Base.tuplehash_seed
        @nexprs $N i -> h = hash(x[$N-i+1], h)
    end
end

function TestHashTuple(n)
    D = [(1, 2, i) for i = 1:n]
    for d in D
        @assert hash(d, UInt(0)) == _hash(d, UInt(0))
    end
    @time for d in D
        hash(d, UInt(0))
    end
    @time for d in D
        _hash(d, UInt(0))
    end
end

I get the result:

julia> versioninfo()
Julia Version 1.5.0
Commit 96786e22cc (2020-08-01 23:44 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: Intel(R) Core(TM) i7-6700HQ CPU @ 2.60GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-9.0.1 (ORCJIT, skylake)

julia> TestHashTuple(1_000_000)
  0.030122 seconds
  0.012990 seconds
guberger commented 3 years ago

Seems to be fixed in Julia 1.6.0. :-) With the code of @blegat above, I get

julia> versioninfo()
Julia Version 1.6.0
Commit f9720dc2eb (2021-03-24 12:55 UTC)
Platform Info:
  OS: Windows (x86_64-w64-mingw32)
  CPU: Intel(R) Core(TM) i7-7600U CPU @ 2.80GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-11.0.1 (ORCJIT, skylake)

julia> TestHashTuple(1_000_000)
  0.005432 seconds
  0.007493 seconds

Big speed-up compared to Julia 1.5.4 btw :-)

julia> versioninfo()
Julia Version 1.5.4
Commit 69fcb5745b (2021-03-11 19:13 UTC)
Platform Info:
  OS: Windows (x86_64-w64-mingw32)
  CPU: Intel(R) Core(TM) i7-7600U CPU @ 2.80GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-9.0.1 (ORCJIT, skylake)

julia> TestHashTuple(1_000_000)
  0.022528 seconds
  0.012084 seconds