Open chethega opened 5 years ago
Broadcast supports non-AbstractArrays, so we can't ask all arguments for their IndexStyle
. We also definitely cannot do this if any dimensions (other than 0) get "extruded". But that said, I think we could conservatively and statically identify more linear-capable broadcasts that are limited to known good cases. It's "as easy" as adding more eachindex
methods.
At the same time, we should also add support for more array-like eachindex
method support so folks can explicitly request IndexLinear
or IndexCartesian
if they need.
We could specialize on <:AbstractArray, <:Ref, <:Number
, etc. A 20x speedup for <:AbstractArray
without nonzero extruded dimensions is imho good enough to justify some code duplication.
I am half of a mind to propose IndexStyle(::Any) = IndexSequential()
(or maybe IndexUnknown()
), indicating that random access is bad or unsupported, and the iteration protocol should be used. Then, we could always query index styles of objects, and optimize a lot of unrelated code: Just like iterators indicate whether they have a known length and eltype, they would signal whether they like random access or need to carry more state in iteration.
It's "as easy" as adding more eachindex methods.
Ok, maybe I'm just blind, but how?
Oh nevermind, I'm wrong. This can't be done type-stably because the decision to extrude is not in the type domain but on runtime values (sizes).
Thus we cannot have a type stable IndexStyle
for broadcasts that returns IndexLinear
for "safe" n-dimensional broadcasts. It'd have to be a runtime switch in a particular copyto!
implementation... and we already are generating a number of for
loops there for different optimizations. In fact, the addition of an extra for
loop was precisely what stalled JuliaLang/julia#30973 — it significantly regressed compile times.
Just to be clear (because obviously I forgot what restrictions we had here myself):
bc[I]
A .+ B
where A
and B
are both Matrix
es isn't safe to do linearly because it might need to swap out one or both of the indices in one or both of the arrays — and that computation requires cartesian indices.There is still a case where this would be type-stable: it's where there is only one non-zero-dimensional argument and it's IndexLinear
and all other arguments are zero-dimensional. In other words, it would apply to none of your original examples (since the RHS could be extruded into the LHS).
Extrusion is determined by run-time values. Broadcasting A .+ B where A and B are both Matrixes isn't safe to do linearly because it might need to swap out one or both of the indices in one or both of the arrays — and that computation requires cartesian indices.
In other words, it would apply to none of your original examples (since the RHS could be extruded into the LHS).
In fact, the addition of an extra for loop was precisely what stalled JuliaLang/julia#30973 — it significantly regressed compile times.
Merde, this is bad. After reading up on that PR of yours, I don't know what to do about this. Thanks for the explanation!
In 2.0, we should consider making broadcast extrusion explicit (i.e. missing dims get silently extruded such that array .+ scalar
continues to work, but A .+ B
of matrices would fail when sizes don't coincide).
I mean, it's a tradeoff. It would suck to lose the ability to do things like A .- mean(A, dims=1)
or A ./ sum(A, dims=1)
.
It is something I've considered, but without the ability to encode singleton dimensions in the type system I think it's a non-starter. Doing this generally was something we talked about in JuliaLang/LinearAlgebra.jl#42 but dismissed as being far too complicated. The introduction of an orthogonal syntax like f.(A, ^B)
or f.(A, ⟂B)
could ameliorate some of the pain, but I still don't think it'd be worth the massive breakage.
Too bad, I had hoped for something similar for GPU arrays since the run-time index calculations are pretty costly there too.
You're right, some of the use cases for the current extrusion behavior are compelling.
So I guess we could prepare a PoC branch that emits both linear and cartesian indexing, with a single hoisted runtime check. That would blow up codesize and compiler latency, but improve runtime perf in most cases. Having such a branch ready would allow us to make more informed decisions on the tradeoffs (code complexity, readability of compilation output, compiler latency, runtime performance), and may be useful for people who prioritize runtime over compile-time perf. Also, the tradeoff may change in the future (compiler latency could get better).
It may be useful to also offer a keyword arg to broadcast
/ broadcast!
that makes extrusions explicit? Something like extrusions = Val(((false,false), (true, false), ()))
for NxN .+ 1xN .+ scalar
. Keyword args to broadcasts could also allow other fast operations that we currently cannot support, like broadcast(args...; mask = mask::Base.LogicalIndex)
(such ops currently need to construct views that incur a large temporary).
It would be nice if we could propagate
IndexStyle
for broadcasts. The issue is the following:Currently, broadcasts always use cartesian indexing. This is slow and prevents a lot of simd.
In the relatively common case that
dest
and allargs
support linear indexing, and the only cases of dropped dimensions are zero-dimensional (as above), we should use linear indexing for a significant speedup (up to 20x), especially if the first dimension is small (which is a very common occurence).