YingboMa / FastBroadcast.jl

MIT License
76 stars 6 forks source link

Allocation when `thread=true` #30

Closed danielwe closed 2 years ago

danielwe commented 2 years ago

Using @.. thread=true produces an allocation. Looks like a variable ends up being boxed or something like that. Writing out the equivalent loop and using @batch from Polyester avoids the allocation. (In fairness, the overhead is usually not devastating---in the timings below the allocating version actually won out due to laptop CPU throttling.)

MWE:

$ julia --threads=4
julia> using BenchmarkTools, FastBroadcast, Polyester

julia> tanh_fastbroadcast!(x, y) = (@.. thread=true x = tanh(y))
tanh_fastbroadcast! (generic function with 1 method)

julia> function tanh_batch!(x, y)
           @batch for i in eachindex(x, y)
               x[i] = tanh(y[i])
           end
       end
tanh_batch! (generic function with 1 method)

julia> N = 32; x = zeros(N);

julia> @btime tanh_fastbroadcast!($x, y) setup=(y = randn(N));
  334.110 ns (1 allocation: 48 bytes)

julia> @btime tanh_batch!($x, y) setup=(y = randn(N));
  347.230 ns (0 allocations: 0 bytes)
chriselrod commented 2 years ago

https://github.com/JuliaSIMD/StrideArraysCore.jl/commit/d9c13e936ce14d46e2c3f5620691b14c989c572b

julia> using BenchmarkTools, FastBroadcast, Polyester

julia> tanh_fastbroadcast!(x, y) = (@.. thread=true x = tanh(y))
tanh_fastbroadcast! (generic function with 1 method)

julia> function tanh_batch!(x, y)
           @batch for i in eachindex(x, y)
               x[i] = tanh(y[i])
           end
       end
tanh_batch! (generic function with 1 method)

julia> N = 32; x = zeros(N);

julia> @btime tanh_fastbroadcast!($x, y) setup=(y = randn(N));
  415.635 ns (0 allocations: 0 bytes)

julia> @btime tanh_batch!($x, y) setup=(y = randn(N));
  431.638 ns (0 allocations: 0 bytes)