JuliaLang / julia

The Julia Programming Language
https://julialang.org/
MIT License
45.44k stars 5.46k forks source link

Use `Core.Typeof` in `Pair` constructor #39873

Open mjram0s opened 3 years ago

mjram0s commented 3 years ago

It would be nice to support dispatch on Pairs defined in the form a=>b, for example:

julia> struct MyType end

julia> MyType isa Type{MyType}
true

julia> foo((a,b)::Pair{Type{MyType}, String}) = "success"
foo (generic function with 1 method)

julia> foo(MyType=>"test")  # won't work because typeof(MyType=>"test") is Pair{DataType,String}
ERROR: MethodError: no method matching foo(::Pair{DataType,String})
Closest candidates are:
  foo(::Pair{Type{MyType},String}) at REPL[3]:1
Stacktrace:
 [1] top-level scope at REPL[6]:1

This could potentially be solved by using Core.Typeof in the Pair constructor.

julia> Core.Typeof(MyType)
Type{MyType}
timholy commented 3 years ago

Welcome, and great first issue!

This is a really interesting edge case: since MyType has, in a sense, two types (DataType and Type{MyType}), which should we pick? On balance I lean away from greater specificity here because in many cases where you are, e.g., building a dictionary from types you actually want to veer sharply in the direction of type-broadening: it's typically far better to use something like an IdDict{Any,String} which has @nospecialized all of its argument types. This is because Type is essentially unbounded in how many specializations you'd require, and it's a huge source of compilation latency if you end up generating them all.

As an example:

tim@diva:/tmp$ juliam --startup-file=no -q
julia> d = Dict{Any,String}()
Dict{Any, String}()

julia> @time for T in subtypes(Any)
           d[Core.Typeof(T)] = string(T)
       end
  8.550475 seconds (31.43 M allocations: 1.693 GiB, 3.33% gc time, 98.73% compilation time)

vs with plain typeof:

tim@diva:/tmp$ juliam --startup-file=no -q
julia> d = Dict{Any,String}()
Dict{Any, String}()

julia> @time for T in subtypes(Any)
           d[typeof(T)] = string(T)
       end
  0.275490 seconds (457.37 k allocations: 24.886 MiB, 17.62% gc time, 90.59% compilation time)

It's even better with an IdDict, even when you use Core.Typeof:

tim@diva:/tmp$ juliam --startup-file=no -q
julia> d = IdDict{Any,String}()
IdDict{Any, String}()

julia> @time for T in subtypes(Any)
           d[Core.Typeof(T)] = string(T)
       end
  0.134169 seconds (209.96 k allocations: 10.859 MiB, 80.31% compilation time)

In all cases, it's the number of method specializations that drives the difference. Since DataType is just one type, the typeof(T) solution gets you most of the benefit of the IdDict, but you'd really want the IdDict if you were using T itself as the key:

tim@diva:/tmp$ juliam --startup-file=no -q
julia> d = Dict{Any,String}()
Dict{Any, String}()

julia> @time for T in subtypes(Any)
           d[T] = string(T)
       end
  8.072373 seconds (31.42 M allocations: 1.692 GiB, 3.62% gc time, 99.22% compilation time)

julia> 
tim@diva:/tmp$ juliam --startup-file=no -q
julia> d = IdDict{Any,String}()
IdDict{Any, String}()

julia> @time for T in subtypes(Any)
           d[T] = string(T)
       end
  0.134071 seconds (209.84 k allocations: 10.852 MiB, 81.47% compilation time)
mjram0s commented 3 years ago

Thank you for insight. I understand why the current implementation is optimal and there are workarounds for my issue. This would just be a nice-to-have if possible 😄

nsajko commented 2 months ago

xref #29368