YingboMa / FastBroadcast.jl

MIT License
76 stars 6 forks source link

Slow performance on custom array types #67

Closed apkille closed 3 months ago

apkille commented 3 months ago

Fast broadcasting on custom array types gives slower performance than pure Julia arrays. This issue is also seen when solving different equations on custom types.

MWE posted in slack by @oscardssmith (using the test example https://github.com/SciML/OrdinaryDiffEq.jl/blob/4a110ca8761845b3aa667a109583775c4c718132/test/interface/noindex_tests.jl):

julia> a = NoIndexArray(ones(10, 10));

julia> b = ones(10,10);

julia> using FastBroadcast

julia> @btime @.. a += 1;
  94.690 ns (1 allocation: 32 bytes)

julia> @btime @.. b += 1;
  31.152 ns (1 allocation: 32 bytes)
chriselrod commented 3 months ago

That is because it defines NoIndexStyle.

julia> @btime @.. $a += 1;
  50.953 ns (0 allocations: 0 bytes)

julia> @btime @.. $b += 1;
  11.011 ns (0 allocations: 0 bytes)

julia> @btime @. $a += 1;
  50.962 ns (0 allocations: 0 bytes)

julia> @btime @. $b += 1;
  20.937 ns (0 allocations: 0 bytes)

julia> copyto!(a, Base.Broadcast.Broadcasted(+, (a, 1)));

julia> @btime copyto!($a, Base.Broadcast.Broadcasted(+, ($a, 1)));
  54.229 ns (0 allocations: 0 bytes)

If you don't want a slow broadcast, don't define a custom slow broadcast method + style. https://github.com/SciML/OrdinaryDiffEq.jl/blob/4a110ca8761845b3aa667a109583775c4c718132/test/interface/noindex_tests.jl#L15-L31

chriselrod commented 3 months ago

To be clear, FastBroadcast only applies to DefaultArrayStyle.

DefaultArrayStyle is in fact the default style for arrays. If someone wants to opt out of using FastBroadcast, they can define their own style, e.g. NoIndexStyle -- which NoIndexArray did.

Normally, that is done for a good reason. For example, for StaticArrays, GPUArrays, or SparseArrays.

The only promise FastBroadcast makes to those explicitly opting out is that they get what they implemented when opting out. In this case, that is a slow copyto! method.

apkille commented 3 months ago

Thanks, this was very helpful. This may be a naive question, but could you please explain why the copyto! method for NoIndexArray is slow? This is also how it is done in Julia base for AbstractArray: https://github.com/JuliaLang/julia/blob/116d42be2f258048cb84756c14fc6e3adbb6b66d/base/broadcast.jl#L960-L976.

chriselrod commented 3 months ago
julia> @btime @.. $a += 1;
  50.953 ns (0 allocations: 0 bytes)

julia> Base.@propagate_inbounds Base.Broadcast._broadcast_getindex(x::NoIndexArray, i) = x.x[i]

julia> @btime @.. $a += 1;
  21.879 ns (0 allocations: 0 bytes)
apkille commented 3 months ago

Wow, that is an easy fix 😅. Thanks again!