Closed jmichel7 closed 6 years ago
I'm bisecting
commit b5e2c2ff2f563140fa707674ec0789e03951b170
Author: Klaus Crusius <KlausC@users.noreply.github.com>
Date: Wed Sep 12 23:35:11 2018 +0200
Allow `Iterators.flatten` for empty tuple (#29112)
(cherry picked from commit b4c370dd05ac33d0c8c02b440b861e8582d1fc74)
:040000 040000 b0f6ee2c6abb7b729a78c14b39dc9f823289976c 817f36cd24bdc1696ce76067b707506378748205 M base
:040000 040000 07f50c8dce7d7a9b2fb21608bd1a0c94ac991c20 adf4a997f804fbcdab58c7df9356eb74ddde2a6d M test
cc @KlausC #29112
@code_typed
is identical for the function on the commit with the regression and the one before...
Oops! I closed the issue by a misclick. I wanted to say: I do not know if the issue has been identified, but if this may help here is a simplified snippet which still shows exactly the same problem:
function foo(elts::Vararg{Vector{Int}})
ee=collect(elts)
for i in 1:4
if all(w->w[i]>w[i+1],ee)
error("should not happen")
end
end
end
const a=[2,1,3,4,5]
const b=[1,2,4,3,5]
test()=foo(a,b)
Here is a way to generate arbitrary slowdown:
N = 10000
a = collect(1:N)
b = collect(2:N+1)
function foo(elts::Vararg{Vector{Int}})
ee=collect(elts)
for i in 1:length(ee[1])-1
f = w -> w[i] > w[i+1]
all(f,ee)
end
end
using BenchmarkTools
@btime foo($a,$b)
0.7:
julia> @btime foo($a,$b)
12.316 μs (2 allocations: 128 bytes)
1.0.1
julia> @btime foo($a,$b)
3.672 ms (38979 allocations: 765.39 KiB)
Profiling shows that we spend a lot in the runtime.
julia> Profile.print(noisefloor=5, C=true)
3255 /Users/kristoffer/julia/src/task.c:271; start_task
3255 /Users/kristoffer/julia/src/./julia.h:1559; jl_apply
3255 ./task.jl:259; (::getfield(REPL, Symbol("##26#27")){REPL.REPLBackend})()
3255 .../julia/stdlib/v1.1/REPL/src/REPL.jl:117; macro expansion
3255 .../julia/stdlib/v1.1/REPL/src/REPL.jl:85; eval_user_input(::Any, ::REPL.REPLBackend)
3255 ./boot.jl:319; eval(::Module, ::Any)
3255 ...rs/kristoffer/julia/src/builtins.c:622; jl_toplevel_eval_in
3255 ...rs/kristoffer/julia/src/toplevel.c:812; jl_toplevel_eval_flex
3255 /Users/kristoffer/julia/src/gf.c:1843; jl_fptr_trampoline
3255 ...tdlib/v1.1/Profile/src/Profile.jl:25; top-level scope
3255 ./REPL[9]:1; macro expansion
291 ./REPL[4]:3; foo(::Array{Int64,1}, ::Vararg{Array{Int64,1},N} ...
204 .../kristoffer/julia/src/builtins.c:731; jl_f_getfield
79 .../kristoffer/julia/src/datatype.c:543; jl_new_bits
74 .../kristoffer/julia/src/datatype.c:681; jl_box_int64
63 ...er/julia/src/./julia_internal.h:263; jl_gc_alloc_
1388 ./REPL[4]:4; foo(::Array{Int64,1}, ::Vararg{Array{Int64,1},N} ...
778 ...s/kristoffer/julia/src/jltypes.c:879; jl_apply_type
755 ...s/kristoffer/julia/src/jltypes.c:925; jl_instantiate_unionall
535 .../kristoffer/julia/src/jltypes.c:1499; inst_type_w_
535 .../kristoffer/julia/src/jltypes.c:1307; inst_datatype
493 ...kristoffer/julia/src/jltypes.c:1301; inst_datatype_env
437 ...kristoffer/julia/src/jltypes.c:1297; inst_datatype_env
293 ...ristoffer/julia/src/jltypes.c:1092; inst_datatype_inner
221 ...ristoffer/julia/src/jltypes.c:716; lookup_type
118 ...ristoffer/julia/src/jltypes.c:678; lookup_type_idx
1553 ./REPL[4]:5; foo(::Array{Int64,1}, ::Vararg{Array{Int64,1},N} ...
828 /Users/kristoffer/julia/src/gf.c:2194; jl_apply_generic
643 /Users/kristoffer/julia/src/gf.c:2148; jl_lookup_generic_
628 ...er/julia/src/./julia_internal.h:876; jl_typemap_assoc_exact
431 .../kristoffer/julia/src/typemap.c:798; jl_typemap_entry_assoc_exact
385 ...kristoffer/julia/src/subtype.c:1229; jl_tuple_isa
261 ...kristoffer/julia/src/subtype.c:1283; jl_isa
261 ...ristoffer/julia/src/subtype.c:1198; jl_subtype
261 ...ristoffer/julia/src/subtype.c:1180; jl_subtype_env
115 ...ristoffer/julia/src/subtype.c:1120; forall_exists_subtype
111 ...istoffer/julia/src/subtype.c:1092; exists_subtype
280 ...er/julia/usr/lib/julia/sys.dylib:?; jfptr_iterate_7082
156 /Users/kristoffer/julia/src/gc.c:965; jl_gc_pool_alloc
328 ./reducedim.jl:658; all(::Function, ::Array{Array{Int64,1},1})
285 ./reducedim.jl:658; #all#550(::Function, ::Function, ::Function, ::A...
94 /Users/kristoffer/julia/src/gf.c:2194; jl_apply_generic
Ok, it is a code specialization heuristic. Changing the function declaration from
foo(elts::Vararg{Vector{Int}})
to
foo(elts::Vararg{Vector{Int}, N}) where {N}
fixes it:
julia> @btime foo($a,$b)
11.868 μs (1 allocation: 96 bytes)
Is this more or less what is described here: https://docs.julialang.org/en/v1/devdocs/functions/#Compiler-efficiency-issues-1?
I guess, don't know how it is related to #29112 though. Or if I just worked around the problem with the extra specialization but the original issue is not fixed.
The collect
call there cannot be inferred in the non-specialized case. (Check code_warntype(foo, Tuple{Vararg{Vector{Int}}})
). That's a consequence of Base.IteratorEltype
not be inferable in that case:
julia> code_warntype(Base.IteratorEltype, Tuple{Tuple{Vararg{Vector{Int}}}})
Body::Union{EltypeUnknown, HasEltype}
123 1 ─ %1 = (Base.typeof)(x)::Type{#s55} where #s55<:Tuple{Vararg{Array{Int64,1},N} where N} │
│ %2 = (isa)(%1, Type{Union{}})::Bool │
└── goto #3 if not %2 │
2 ─ goto #4 │
3 ─ %5 = (Base.IteratorEltype)(%1)::Union{EltypeUnknown, HasEltype} │
└── goto #4 │
4 ┄ %7 = φ (#2 => $(QuoteNode(Base.EltypeUnknown())), #3 => %5)::Union{EltypeUnknown, HasEltype} │
└── return %7
Removing this method, added in #29112: https://github.com/JuliaLang/julia/blob/1968b239fa6f5ab266e62e2240d828f37622ee83/base/generator.jl#L129 we get:
julia> code_warntype(Base.IteratorEltype, Tuple{Tuple{Vararg{Vector{Int}}}})
Body::Base.HasEltype
123 1 ─ return $(QuoteNode(Base.HasEltype()))
and performance of the original example is back at 0.7 levels. Not sure what to do here.
@JeffBezanson, any ideas how to do better here?
One of my programs which took 1mn on 1.0.0 takes 5mn on 1.0.1 I tracked at least part of the problem to the following code snippet:
On 1.0.0
On 1.0.1:
Does anyone know what’s happening here?