Open jishnub opened 1 year ago
I think we already have a issue to track this problem but I can't find it.
Anyway, the dim1 in this example is too small for vectorization and our broadcast only tries to vectorize the inner most loop.
map!
is faster here only because the inputs have linear index. If they are cartesian-indexed, I guess map!
would be slower as IIRC it's zip
based.
Such wide matrices are frequently encountered in BandedMatrices
. It would be good if broadcasting had a fast path that used linear indexing if all terms are compatible with it.
We have some related (Edit: but for different purpose) trial in Base
(#30973), but I can't say it would be landed in the near future.
For Pkg dev I think Fastbroadcast.jl
might be the best solution for now. (Although it might not be that light weight ...)
I think we already have a issue to track this problem but I can't find it.
Do you refer to #28126?
No the number of broadcasted args in MWE above is too small thus it won't hit https://github.com/JuliaLang/julia/issues/28126. The solution here is switching to linear-indexing if we can prove:
axes
(or they have 0 dimension.)IndexLinear
.
See https://discourse.julialang.org/t/why-is-a-multi-argument-inplace-map-much-faster-in-this-case-than-a-broadcast/91525/6, the following seems broadly reproducible across a range of platforms:
This difference goes away for nearly square matrices, and is minimal for tall matrices. On some platforms, broadcasting performs better for the tall and square cases. However,
map!
seems to consistently do better for the wide case.