JuliaSIMD / Polyester.jl

The cheapest threads you can find!
MIT License
244 stars 14 forks source link

Type-inference failure when using `--check-bounds=no` with Julia 1.10 #132

Open Keluaa opened 9 months ago

Keluaa commented 9 months ago

I am on a Nvidia Grace CPU, using Polyester.jl v0.7.9 and Julia 1.10.0. When starting Julia with julia -t 16 --check-bounds=no:

julia> using Polyester

julia> f(n) = @batch for _ in 1:n end
f (generic function with 1 method)

julia> f(50)

julia> @time f(50)
  0.000696 seconds (146 allocations: 4.562 KiB)

julia> @time f(500)
  0.001785 seconds (596 allocations: 18.625 KiB)

julia> @code_typed f(50)
CodeInfo(
1 ─── %1   = Base.Threads.cglobal(:jl_n_threads_per_pool, Ptr{Int32})::Ptr{Ptr{Int32}}
│     %2   = Base.pointerref(%1, 1, 1)::Ptr{Int32}
...
122 ┄ %437 = φ (#120 => %322, #88 => 0x0000000000000000)::UInt64
│     %438 = Base.add_int(%437, 0x0000000000000001)::UInt64
│     %439 = Base.bitcast(Int64, %438)::Int64
│     %440 = Base.bitcast(Int64, %51)::Int64
│     %441 = invoke Main.:*(%439::Int64, static(1)::Static.StaticInt{1})::Any
│     %442 = (%441 + static(1))::Any
│     %443 = (%442 - static(1))::Any
│     %444 = invoke Main.:*(%440::Int64, static(1)::Static.StaticInt{1})::Any
...

Allocations happen at each loop iteration, because for some reason *(::Int64, ::StaticInt{1}) was not inlined? AllocCheck.jl reports a lot of dynamic calls where Static values are used. The rest of @code_typed shows similar occurrences. This does not happen on x86.

chriselrod commented 9 months ago

This does not happen on x86.

Even with --check-bounds=no? There have been similar reports elsewhere about --check-bounds=no causing inference problems that I could reproduce on x86, e.g. https://github.com/JuliaSIMD/StrideArrays.jl/issues/78

Keluaa commented 9 months ago

Upon trying again, it does also happen on x86 with Julia 1.10. I mistakenly used Julia 1.9.4 for which the bug does not happen, therefore it might very well be the exact same bug as you mentioned.