I understand that this package doesn't speed up dynamic broadcasts, but sometimes we code before we think and I'm guessing you still want the fallback to work, albeit slowly, rather than throw confusing errors? If so, check out the following MWE. Note that the error only happens on the first call to mul_diagonal!. From the second call it works as expected---slower than a simple @., but correct.
$ julia --threads=2
julia> using FastBroadcast
julia> function mul_diagonal!(C, A, d)
dt = transpose(d) # could also use permutedims, same problem
@.. thread=true C = A * dt
end
julia> N = 4; A = randn(N, N); C = similar(A); d = rand(N);
julia> mul_diagonal!(C, A, d);
ERROR: MethodError: no method matching broadcastgetindex(::SubArray{Float64, 2, Matrix{Float64}, Tuple{Base.Slice{Base.OneTo{Int64}}, UnitRange{Int64}}, true}, ::UInt64, ::UInt64)
Closest candidates are:
broadcastgetindex(::Any, ::Int64...) where N at ~/.julia/packages/FastBroadcast/lHiEr/src/FastBroadcast.jl:85
Stacktrace:
[1] macro expansion
@ ~/.julia/packages/FastBroadcast/lHiEr/src/FastBroadcast.jl:344 [inlined]
[2] macro expansion
@ ./cartesian.jl:64 [inlined]
[3] macro expansion
@ ~/.julia/packages/FastBroadcast/lHiEr/src/FastBroadcast.jl:343 [inlined]
[4] fast_materialize!
@ ~/.julia/packages/FastBroadcast/lHiEr/src/FastBroadcast.jl:304 [inlined]
[5] fast_materialize!
@ ~/.julia/packages/FastBroadcast/lHiEr/src/FastBroadcast.jl:29 [inlined]
[6] (::FastBroadcast.var"#3#4")(::Tuple{StrideArraysCore.PtrArray{Tuple{Int64, Int64}, (true, true), Float64, 2, 1, 0, (1, 2), Tuple{Static.StaticInt{8}, Int64}, Tuple{Static.StaticInt{1}, Static.StaticInt{1}}}, Base.OneTo{Int64}, Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{2}, Nothing, typeof(*), Tuple{Matrix{Float64}, LinearAlgebra.Transpose{Float64, Vector{Float64}}}}, Val{2}, Static.True}, start::Int64, stop::Int64)
@ FastBroadcast ~/.julia/packages/FastBroadcast/lHiEr/src/FastBroadcast.jl:59
[7] macro expansion
@ ~/.julia/packages/Polyester/4Q0NM/src/batch.jl:110 [inlined]
[8] #_batch_no_reserve#5
@ ~/.julia/packages/Polyester/4Q0NM/src/batch.jl:71 [inlined]
[9] #batch#7
@ ~/.julia/packages/Polyester/4Q0NM/src/batch.jl:251 [inlined]
[10] batch
@ ~/.julia/packages/Polyester/4Q0NM/src/batch.jl:235 [inlined]
[11] fast_materialize_threaded!(dst::Matrix{Float64}, #unused#::Static.True, bc::Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{2}, Nothing, typeof(*), Tuple{Matrix{Float64}, LinearAlgebra.Transpose{Float64, Vector{Float64}}}}, dstaxes::Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}})
@ FastBroadcast ~/.julia/packages/FastBroadcast/lHiEr/src/FastBroadcast.jl:55
[12] fast_materialize!
@ ~/.julia/packages/FastBroadcast/lHiEr/src/FastBroadcast.jl:48 [inlined]
[13] mul_diagonal!(C::Matrix{Float64}, A::Matrix{Float64}, d::Vector{Float64})
@ Main ./REPL[5]:3
[14] top-level scope
@ REPL[10]:1
julia> using LinearAlgebra
julia> C ≈ A * Diagonal(d)
false
julia> mul_diagonal!(C, A, d);
julia> C ≈ A * Diagonal(d)
true
I understand that this package doesn't speed up dynamic broadcasts, but sometimes we code before we think and I'm guessing you still want the fallback to work, albeit slowly, rather than throw confusing errors? If so, check out the following MWE. Note that the error only happens on the first call to
mul_diagonal!
. From the second call it works as expected---slower than a simple@.
, but correct.