JuliaLang / julia

The Julia Programming Language
https://julialang.org/
MIT License
45.86k stars 5.49k forks source link

`hash(::Vector{<abstract type>}, ::UInt)` allocates quite a bit due to type instability: why can't hash(::Any) infer `UInt` return type? #44244

Open NHDaly opened 2 years ago

NHDaly commented 2 years ago

Hi!

Here's a very small MRE that reproduces the issue I'll discuss below:

julia> @btime hash($(Any[i for i in 1:1000]), UInt(0))
  20.558 μs (1001 allocations: 15.64 KiB)
0x0e97df552a8d635e

And you can see that all those allocations are UInts:

julia> import Profile

julia> const v = Any[i for i in 1:1000];

julia> Profile.Allocs.clear(); Profile.Allocs.@profile sample_rate=1 hash(v, UInt(0))
0x0e97df552a8d635e

julia> results = Profile.Allocs.fetch();

julia> length(results.allocs)
1001

julia> unique([alloc.type for alloc in Profile.Allocs.fetch().allocs])
1-element Vector{DataType}:
 UInt64

We've been using the new allocations profiler on our codebase, and we've found that on some benchmarks we are spending roughly 30% of allocations on this method instance: hash(::Vector{RelationalAITypes.DBType}, ::UInt64):

Screen Shot 2022-02-18 at 1 54 01 PM

You can see that of these allocations, roughly 50% are allocating the UInt64s of the results of iterating the vector.

Here's the reported allocation locations:

  Total:       14323      25157 (flat, cum) 30.66%
   2428            .          .               n = 0 
   2429            .          .               while true 
   2430            .          .                   n += 1 
   2431            .          .                   # Hash the current key-index and its element 
   2432            .          .                   elt = A[keyidx] 
   2433        10798      12908                   h = hash(keyidx=>elt, h) 
   2434            .          .            
   2435            .          .                   # Skip backwards a Fibonacci number of indices -- this is a linear index operation 
   2436            .          .                   linidx = key_to_linear[keyidx] 
   2437            .          .                   linidx < fibskip + first_linear && break 
   2438            .          .                   linidx -= fibskip 
   2439            .          .                   keyidx = linear_to_key[linidx] 
   2440            .          .            
   2441            .          .                   # Only increase the Fibonacci skip once every N iterations. This was chosen 
   2442            .          .                   # to be big enough that all elements of small arrays get hashed while 
   2443            .          .                   # obscenely large arrays are still tractable. With a choice of N=4096, an 
   2444            .          .                   # entirely-distinct 8000-element array will have ~75% of its elements hashed, 
   2445            .          .                   # with every other element hashed in the first half of the array. At the same 
   2446            .          .                   # time, hashing a `typemax(Int64)`-length Float64 range takes about a second. 
   2447            .          .                   if rem(n, 4096) == 0 
   2448            .          .                       fibskip, prevfibskip = fibskip + prevfibskip, fibskip 
   2449            .          .                   end 
   2450            .          .            
   2451            .          .                   # Find a key index with a value distinct from `elt` -- might be `keyidx` itself 
   2452         3525      12249                   keyidx = findprev(!isequal(elt), A, keyidx) 
   2453            .          .                   keyidx === nothing && break 
   2454            .          .               end 
   2455            .          .            
   2456            .          .               return h 
   2457            .          .           end 

You can see that most of the direct allocations come from the call to hash(elt, h).

I know that this is because there is a type instability there, since the Vector is a vector of abstract elements. But my question is: even if you don't know the type of the element, why can't julia deduce the return type to be a UInt in this case? And if it could, would that be enough to avoid allocating the return value?

I see that currently roughly 1/3 of the hash() methods in base cannot infer their return type:

julia> length([i for (i,v) in enumerate(Base.return_types(hash)) if v==Any]), length(methods(hash))
(22, 68)

Here's a small sample of the ones that it reports with return type Any:

[4] hash(F::LinearAlgebra.Eigen, h::UInt64) in LinearAlgebra at /Users/nathandaly/src/julia/usr/share/julia/stdlib/v1.8/LinearAlgebra/src/eigen.jl:627
[6] hash(z::Complex, h::UInt64) in Base at complex.jl:255
[8] hash(s::AbstractSet, h::UInt64) in Base at set.jl:433
[10] hash(r::Distributed.AbstractRemoteRef, h::UInt64) in Distributed at /Users/nathandaly/src/julia/usr/share/julia/stdlib/v1.8/Distributed/src/remotecall.jl:139
[11] hash(a::AbstractDict, h::UInt64) in Base at abstractdict.jl:522
[12] hash(g::Base.Unicode.GraphemeIterator, h::UInt64) in Base.Unicode at strings/unicode.jl:718
[13] hash(x::NamedTuple, h::UInt64) in Base at namedtuple.jl:196
[14] hash(F::LinearAlgebra.QRCompactWY, h::UInt64) in LinearAlgebra at /Users/nathandaly/src/julia/usr/share/julia/stdlib/v1.8/LinearAlgebra/src/qr.jl:152
[17] hash(p::Pair, h::UInt64) in Base at pair.jl:39

I'm wondering: would it be acceptable to put a return type annotation and/or return-site type assertion on all these methods asserting that they return a UInt64?

One plausible explanation might be that we don't require hash(x, h) to return a UInt, but I don't think that's true, because if the hash function returns anything other than a UInt, it will cause a MethodError when the returned hash value is passed along as the seed to the next call to hash()!

Consider this example:

julia> struct MyType
           x::Int
       end

julia> Base.hash(m::MyType, h::UInt) = UInt128(hash(m.x, h))

julia> hash([MyType(2), MyType(3)])
ERROR: MethodError: no method matching hash(::MyType, ::UInt128)
Closest candidates are:
  hash(::MyType, ::UInt64) at REPL[6]:1
  hash(::Any) at ~/src/julia/usr/share/julia/base/hashing.jl:20
  hash(::Any, ::UInt64) at ~/src/julia/usr/share/julia/base/hashing.jl:25
Stacktrace:
 [1] hash(A::Vector{MyType}, h::UInt64)
   @ Base ./abstractarray.jl:3018
 [2] hash(x::Vector{MyType})
   @ Base ./hashing.jl:20
 [3] top-level scope
   @ REPL[7]:1

So it seems to me that there is an (implicit) requirement in the current API that Base.hash functions must return a UInt64. So my question is can we then take advantage of that requirement to provide a hint to the compiler that can eliminate the allocations in the presence of this type instability?

KristofferC commented 2 years ago

I'm wondering: would it be acceptable to put a return type annotation and/or return-site type assertion on all these methods asserting that they return a UInt64?

I am not sure that helps, for example:

struct S1 end
struct S2 end
struct S3 end
struct S4 end
struct S5 end

for s in [:S1, :S2, :S3, :S4, :S5]
    @eval g(x::$s) = 0
end

f(x) = g(x[])

code_warntype(f, Tuple{Ref{Any}})

shows a return value of Any even though all methods are concretely inferred (code_warntype(g, Tuple{Any})). So I think the problem is that there are too many methods for hash and type asserting them does not help.

NHDaly commented 2 years ago

gotttttttttcha. Okay thanks, that makes sense! 👍

It's too bad, because we could even still a ::Int at the call site, since we know for suresies that we're gonna get 0 back, but i don't think that would help either.. It still does the allocation and then just does a type assert.

🤔 it would be great if there was like a "dispatch-lite" where if you could know the type at the callsite, you could maybe still do dynamic dispatch, but call the julia_g_ version of the function instead of the jlfptr_g_ version that allocates? 🤔 Like jl_apply_generic_with_asserted_type(f, args, asserted_type) and it would check that the method it matches will return the type you expect before calling it? is that crazy?

KristofferC commented 2 years ago

Are the subtypes of the abstract element type known upfront? In that case, you could call out to a manually union split function which should be type stable since it doesn't include any dynamic calls. Something like https://github.com/jlapeyre/ManualDispatch.jl.

Keno commented 2 years ago

We don't really have an optimization right now for "Do a dynamic dispatch, but all possible specializations are known to return the same type". If we did, you could insert a specialization barrier like:

diff --git a/base/abstractarray.jl b/base/abstractarray.jl
index 1b21201ffa..0e5ffce624 100644
--- a/base/abstractarray.jl
+++ b/base/abstractarray.jl
@@ -3002,6 +3002,7 @@ pushfirst!(A, a, b, c...) = pushfirst!(pushfirst!(A, c...), a, b)
 ## hashing AbstractArray ##

 const hash_abstractarray_seed = UInt === UInt64 ? 0x7e2d6fb6448beb77 : 0xd4514ce5
+hash_known_uint(x, h::UInt) = hash(x, h)::UInt
 function hash(A::AbstractArray, h::UInt)
     h += hash_abstractarray_seed
     # Axes are themselves AbstractArrays, so hashing them directly would stack overflow
@@ -3012,7 +3013,7 @@ function hash(A::AbstractArray, h::UInt)
     # For short arrays, it's not worth doing anything complicated
     if length(A) < 8192
         for x in A
-            h = hash(x, h)
+            h = hash_known_uint(x, h)
         end
         return h
     end
@@ -3048,7 +3049,7 @@ function hash(A::AbstractArray, h::UInt)
         n += 1
         # Hash the element
         elt = A[keyidx]
-        h = hash(keyidx=>elt, h)
+        h = hash_known_uint(keyidx=>elt, h)

         # Skip backwards a Fibonacci number of indices -- this is a linear index operation
         linidx = key_to_linear[keyidx]

And get a specialized dynamic dispatch for hash_known_uint that wouldn't need to allocate its return value. As it stands though, whenever you have a dynamic dispatch, you need to allocate the return value in order to fit through jl_apply_generic, which has a jl_value_t * return.

Keno commented 2 years ago

That said, it could definitely be implemented though, but it's a bit of work.

jakobnissen commented 2 years ago

Would something like #40880 help with this? That is, would it be possible for the compiler to do the following inference:

I may be talking out of my ass here.

JeffBezanson commented 2 years ago

Yes, this is a case where reverse dataflow could be helpful. It's a bit hard to use the information though --- we can't literally insert a typeassert, because if f doesn't return a UInt we need to give a method error for g instead of a type error.

Manually adding type declarations in these cases can still be useful though since it will at least speed up subsequent operations. I think it's worth doing for containers likely to be used with abstract element types, e.g. Vector and Dict.

vtjnash commented 2 years ago

We do all that already (with inserted typeassert and MethodError), we just don't have the complicated version where it also manages to avoid the allocation by shifting that error around in somewhat complicated ways.

NHDaly commented 2 years ago

An update here:

The hash allocs saga continues. (We still continue to see UInt allocations from hash showing up at RAI a lot.)

I had a clever (I think?) idea yesterday to avoid the allocation on every element by switching the dynamic dispatch to a function that doesn't have any return value, and then allowing that to have a static dispatch to hash.

function my_hash(vec::Vector, h::UInt)
    h += Base.hash_abstractarray_seed
    h = hash(map(first, axes(vec)), h)
    h = hash(map(last, axes(vec)), h)
    # Use a Ref to avoid allocations from the return value of hash():
    ref = Ref{UInt}(h)
    for e in vec
        hash_into!(ref, e)::Nothing
    end
    h = ref[]::UInt
    return h::UInt
end
function hash_into!(ref::Ref{UInt}, v::T) where T  # (force specialization)
    ref[] = my_hash(v, ref[])
    return nothing
end
my_hash(a, h::UInt) = hash(a, h)

It seems this helps a lot! Even in the case with very small data / a single element:

julia> v = Any[i for i in 1:1_000];

julia> @btime hash($v, $(UInt(0)))
  16.552 μs (1001 allocations: 15.64 KiB)
0x0e97df552a8d635e

julia> @btime my_hash($v, $(UInt(0)))
  9.950 μs (1 allocation: 16 bytes)
0x0e97df552a8d635e

julia> v = Any[i for i in 1:1_000_000];

julia> @btime hash($v, $(UInt(0)))
  26.136 ms (330659 allocations: 5.68 MiB)
0xb5d91dc36a23d5be

julia> @btime my_hash($v, $(UInt(0)))
  10.870 ms (1 allocation: 16 bytes)
0x3eddf02e950af97c

julia> @btime hash($(Any[1]), $(UInt(0)))
  28.156 ns (2 allocations: 32 bytes)
0xc7253bdde85cb9f5

julia> @btime my_hash($(Any[1]), $(UInt(0)))
  20.572 ns (1 allocation: 16 bytes)
0xc7253bdde85cb9f5

(For a 1,000,000 elements, it's faster, even without the optimizations that Base.hash has for very large vectors (which is the reason for the different return value in that case).)

So:

  1. Do you think we should just switch over the implementation of hash() for container types to use this approach? (Maybe only when their eltype is an abstract type.)
  2. I'm feeling greedy: Is there any way we can eliminate the allocation of the Ref? I know for a fact that it won't escape, so it could be stack allocated. But it seems that escape analysis isn't able to determine that through the dynamic dispatch? 😢
    • @aviatesk is there any way to hint/force this through your fancy new EA merge?
    • Any other tricks anyone can think of to eliminate the Ref alloc?

Thanks! :)

NHDaly commented 2 years ago

And assuming that we cannot get rid of the Ref alloc, there is still a bit more improvement that can be made here for nested containers by adding hash_into!() methods for container types that pass through the Ref rather than allocating another one at each level:

julia> v = Any[ Any[i] for i in 1:1_000 ];

julia> @btime hash($v, $(UInt(0)))
  37.920 μs (2001 allocations: 31.27 KiB)
0x5bb22512b3f4458e

julia> @btime my_hash($v, $(UInt(0)))
  30.817 μs (1001 allocations: 15.64 KiB)
0x5bb22512b3f4458e
julia> # Add a custom method to pass-through the ref instead of allocating a new one
       function hash_into!(ref::Ref{UInt}, vec::Vector{T}) where T  # (force specialization)
           ref[] += Base.hash_abstractarray_seed
           ref[] = hash(map(first, axes(vec)), ref[])
           ref[] = hash(map(last, axes(vec)), ref[])
           for e in vec
               hash_into!(ref, e)::Nothing
           end
           return nothing
       end
hash_into!

julia> @btime my_hash($v, $(UInt(0)))
  23.888 μs (1 allocation: 16 bytes)
0x5bb22512b3f4458e

(And we can then of course define my_hash() in terms of hash_into!(), so we don't have to repeat ourselves):

function my_hash(vec::Vector, h::UInt)
    # Use a Ref to avoid allocations from the return value of hash():
    ref = Ref{UInt}(h)
    hash_into!(ref, vec)
    return ref[]::UInt
end

This approach would basically mean one of these hash_into!() implementations for all containers of abstract types. But we'd probably want that anyway, even in the previous post, where we are able to eliminate the Ref alloc.

NHDaly commented 1 year ago

See discussion here, from JuliaCon hackathon: https://github.com/JuliaLang/julia/pull/50136#issuecomment-1656811849

We also talked about the general problem in https://github.com/JuliaLang/julia/issues/44244.

  • Using something like hash_into!(::Ref, v) seems like a decent workaround.
  • But it'd be nicer to fix this in the language, via something like the above proposal.
  • We'd really also need to fix the return type allocations... For that we have a few possibilities:
    1. function-level return type declarations (function hash::UInt end)
    2. preallocating a return value slot on the stack, via call-site return type assertions
    3. ... there was a variant on 2. that i don't quite remember.