JuliaLang / julia

The Julia Programming Language
https://julialang.org/
MIT License
45.88k stars 5.49k forks source link

Complicated lazy broadcasting slower than equivalent single broadcast #56629

Open mcabbott opened 5 days ago

mcabbott commented 5 days ago

This example from Discourse shows a slowdown when broadcasting a moderately complicated expression, instead of broadcasting a function containing the same expression:

arrayfun!(C, A, B) = @. C = A^2 + B^2 + A * B + A / B - A * B - A / B + A * B + A / B - A * B - A / B
scalarfun(A::Real, B::Real) = A^2 + B^2 + A * B + A / B - A * B - A / B + A * B + A / B - A * B - A / B

let N = 151
    A, B, C1, C2 = (rand(N,N,N).+1 for _ in 1:4)
    @btime arrayfun!($C1, $A, $B)
    @btime $C2 .= scalarfun.($A, $B)
    C1 ≈ C2
end
#  17.306 ms (11 allocations: 352 bytes)
#   5.900 ms (0 allocations: 0 bytes)

The effect seems fairly robust, it's not particular to 3D arrays, nor to A^2. Replacing @. with .+ etc. helps a bit (which according to #29120 removes n-ary +, here n<=4):

arrayfun!(C, A, B) = C .= A.^2 .+ B.^2 .+ A .* B .+ A ./ B .- A .* B .- A ./ B .+ A .* B .+ A ./ B .- A .* B .- A ./ B
#  17.345 ms (0 allocations: 0 bytes)

Simpler expressions also have the slowdown but no allocation:

arrayfun!(C, A, B) = @. C = A^2 + B^2 + A * B + A / B
scalarfun(A::Real, B::Real) = A^2 + B^2 + A * B + A / B
#  3.148 ms (0 allocations: 0 bytes)
#  971.000 μs (0 allocations: 0 bytes)

Even simpler expressions like arrayfun!(C, A, B) = @. C = A^2 + B^2 show no slowdown at all.

roflmaostc commented 4 days ago

This is quite a severe penalty, isn't it?

A lot of my code uses broadcast expressions with at least >5 dots. Replacing all of them with function calls is very unpractical.

mcabbott commented 4 days ago

A macro could in principle produce the scalarfun form for you. Building this into @. would be a bit scary (seems likely to expose all kinds of special assumptions in code). But the question before contemplating that is: Why can't the complicated form compile down to the same code?