Open mbauman opened 5 years ago
Very nice write-up. Just thinking out loud
julia> getindex.(Ref(randn(4,4)), 1:4, (1:4)')
4×4 Array{Float64,2}:
0.698382 -0.450998 0.00277274 0.265284
1.39098 -0.274713 0.947628 0.853339
0.232767 0.800546 1.53554 1.21039
1.46495 -1.86763 -1.17628 -0.228153
getindex
is only defined for scalars but A[1:2, 1:3, 1:4]
would lower to something like getindex.(Ref(A), reshape(1:2, 2, 1, 1), reshape(1:3, 1, 3, 1), reshape(1:4, 1, 1, 4))
.It is a very interesting read, thank you.
Thus,
A[I, J]
becomesA.[I, ^J]
.
Does it mean ^
is an operator to "add a singleton dimension", i.e, A.[I, ^J, ^^K]
with I
, J
, K
are all Vector{Int}
is equivalent to A[I, J, K]
? Also, I wonder if it makes sense to allow inserting a singleton dimension at arbitrary position. For example, A[., ^, ...]
to do reshape(A, size(A, 1), 1, size(A)[2:end]...)
lazily. But A[^, ...]
is currently a valid syntax so it is (slightly?) breaking.
an expression like
A[clamp.(idx .+ (-N:N), 1, end)].^2
Nit picking, but did you mean A.[clamp.(idx .+ (-N:N), 1, end)].^2
(a dot after A
)?
the very next thing you'd want to have fuse:
A.[A .> 0]
I'm guessing it's equivalent to getindex.(Ref(A), A .> 0)
. But then wouldn't it cause an error unless A[true]
and A[false]
are defined (if A
is a vector)? Also, the result would be different from A[A .> 0]
, I suppose? For fusing filtering and reduction, I imagine @lazy
or some additional (succinct, I hope) syntax for non-materializing broadcasting as in #19198 would be simpler.
That would be a nightmare to figure out how to fuse.
I cannot help thinking that transducer (sorry, that's my library) is the right approach for fusing mapping, filtering and reduction. Currently it can fuse operations like B[f.(A)] .= g.(A[f.(A)])
and foldl(h, g.(A[f.(A)]))
for vectors (where f(::eltype(A)) :: Bool
). But I haven't figured out what's the best way to do it in multidimensional case yet.
Awesome writeup.
Detail - would ^
just be interpreted as a lazy permutedims
, essentially a '
operator that has been fixed to not be recursive and thus not have the problems of mystringvector'
? That might be generally useful outside the world of getindex as well.
Thanks for the comments.
Wouldn't multidimensional slicing just correspond to broadcasting along different directions
Yes, exactly. That's my thought behind the ^
syntax — make it easy to "lift" those arguments to the new dimensions.
Since we already have special syntax for indexing, it might be possible retain some of the current behavior while internally handling everything with broadcasting by changing lowering such that
getindex
is only defined for scalars butA[1:2, 1:3, 1:4]
would lower to something likegetindex.(Ref(A), reshape(1:2, 2, 1, 1), reshape(1:3, 1, 3, 1), reshape(1:4, 1, 1, 4))
.
This is different in two ways from my proposal:
CartesianIndex
with A[CartesianIndex.(axes(A)...)]
. Broadcasting allows the user to choose how they'd like their indices to combine.A[x]
expressions would lower to a 0-d broadcast — even for scalar indexing. Given that scalar indexing is such a core construct in this language, that concerns me. Even if we got the run-time performance to complete parity, it'd be quite the compile-time burden.Does it mean
^
is an operator to "add a singleton dimension", i.e,A.[I, ^J, ^^K]
withI
,J
,K
are allVector{Int}
is equivalent toA[I, J, K]
?
No, my thought is that this is a parser-level transform that automagically will add the correct number of singleton dimensions such that the flagged argument is in a higher dimension than all previous ones. So it's not just adding one singleton dimension — it'd be A.[I, ^J, ^K]
to be equivalent to the old A[I, J, K]
, no matter what I
, J
, and K
are. Remember that I
might be a matrix — which would add an extra dimension there, so J
would need to have two leading singleton dimensions and K
would need two + however many J
had.
There would be something nice about it simply meaning add one singleton dimension — it's simple and straightforward, and it'd allow you to do things like f.(^A, B)
to put A
along the columns and B
along the rows. It'd be a slightly more challenging upgrade path as — given a nonscalar indexing expression A[I, J]
, you don't know how many ^
to put in front of J
to make it orthogonal to I
without referencing I
and making a mess of the expression.
And yes, supporting ^
generally would completely obviate all my issues with the removal of the recursive transpose — that's the main reason I ever transposed arrays of strings.
Yeah, logical indexing is something that just "feels" like it'd be nice to fuse and avoid the allocation of the intermediate logical array. The point is somewhat moot because the same problem also applies to :
and Not
and any other index conversion. But it is a strange bird — indexing by arrays means indexing by the elements unless the element type of the array is Bool
, in which case it does something completely different. There's an additional challenge in that logical indices get transformed into an intermediate lazy object that only supports iteration (and not indexing). Too bad it's so darn useful.
On the last point: If logical indices always get transformed into intermediate lazy iterables, it seems like little would be lost when one converts the current syntax for logical indexing(A[B]
) to the new syntax with a findall
and a broadcasted getindex (A.[findall[B]]
). While the new syntax is more verbose, it allows the same functionality and looks more like what was meant in the first place. I like the more consistent rules for getindex
.
It would keep nonscalar indexing APL-like
That was indeed my purpose. I do like APL-indexing.
Broadcasting is everywhere in this language now. I think the combination of APL indexing and broadcasting is actually rather confusing. We just teach them as wholly unrelated concepts, but if you try to think about both simultaneously it's quite a challenge.
You might be right but I'm not sure it will be confusing. The two concepts will have separate syntax ([ ]
vs .
) and it might be possible to have a clear translation from APL-indexing to broadcasting which could make it possible to reason about APL-indexing in a broadcast context.
Regarding syntax, I kind of like the idea of the number of ^
prefixes indicating "lift levels", i.e. A.[I, ^J, ^^K]
would be equivalent of A[I, J, K]
but you could also do A.[^I, J, ^K]
and the J
indices would end up first in the output while the I
and K
indices would come next, broadcast with each other. In that case I'm not sure what A.[^I, ^J]
would mean—either the same as A.[I, J]
or an error? That seems both slightly more intuitive and slightly more general.
If ^
simply means add a leading singleton dimension, then A.[^I, ^J]
would mean ^A.[I, J]
— that is, both I
and J
are used simultaneously to populate the columns. Or in other words, A.[reshape(I, 1, :), reshape(J, 1, :)]
.
Another important consideration: how would view
/@view
/@views
change?
Would there be a large difference between broadcasted(getindex, Ref(A), map(Lifted, inds)...)
and view(A, inds...)
?
No, that's not what I'm proposing: what I'm proposing is that the number of prefix ^
defines orthogonal groups of indices. So consider for example A.[^I, J, ^^K, ^L]
. This has three different orthogonal groups of indices: no carets—J
; one caret—I
and L
; two carets—K
. The groups of indices appear in the output in the order of the number of carets so the J
indices first, then the I
/L
indices, then the K
indices. Since the I
and L
indices are in the same group, they would be broadcast together.
Interesting stuff. I'm noticing that almost all the pain comes from transitioning the meaning of []
. At least for the short term (and perhaps permanently), it seems that almost all the pain points go away if we introduce separate syntax (where the macro is achievable with @\cdot\tab
):
julia> @generated function indextuple(A::AbstractArray{T,N}) where {T,N}
idxs = [gensym(:i) for i = 1:N]
quote
$idxs
end
end
indextuple (generic function with 1 method)
julia> A = [1 2; 3 4]
2×2 Array{Int64,2}:
1 2
3 4
julia> macro ⋅(A)
f = gensym(A)
quote
let A = $(esc(A))
idxs = indextuple(A)
f($(idxs...)) = A[$(idxs...)]
end
end
end
@⋅ (macro with 1 method)
julia> i, j = 1:2, 1:2
(1:2, 1:2)
julia> (@⋅A).(i, j) .* i .* j'
2×2 Array{Int64,2}:
1 2
8 16
julia> j = [2,1]
2-element Array{Int64,1}:
2
1
julia> (@⋅A).(i, j) .* i .* j'
2×2 Array{Int64,2}:
4 2
12 6
@StefanKarpinski Ooooh, interesting! That's very clever and creative, but I'm not sure that folks index with matrices enough for the additional complexity there to be worthwhile. I think I'd prefer to try the simpler rule that each ^
adds one leading singleton dimension first.
@peterahrens On views I don't have a firm enough sense of what Lifted
means to answer your question, but yes, views are currently orthogonal, but if we make .[]
non-orthogonal then they'd no longer match for macros like @views
.
@timholy The A.[...]
syntax is available and we can make it mean whatever we want... and we needn't necessarily deprecate nonscalar A[...]
. Still at the early stages of pie-in-the-sky brainstorming at the moment.
Some thoughts from a Python/NumPy perspective (tl;dr there's a potential ambiguity, but I'd be fine with it being resolved either possible way):
A[...]
is a venerable tradition from APL through NumPy that's familiar to most potential switchers-to-Julia and has semantics that are fairly easy to internalize ("the shape of the output is the Cartesian product of the shape of the nonscalar indices").>>> x = np.array([[1, 2, 3], [4, 5, 6]])
>>> x
array([[1, 2, 3],
[4, 5, 6]])
>>> x[[0, 1], [0, 1]]
array([1, 5])
>>> x[0:2, 0:2]
array([[1, 2],
[4, 5]])
A.[...]
together with the natural interpretation of zipped nonscalar indexing as a broadcasting operation suggests that the all-parallel case should be spelled with a dot.^
on top of the dotted spelling sounds pretty good. But there's a potential divergence from NumPy, because they put the indices into two global categories (parallel and orthogonal) while this proposal appears to make a local determination relative to indices to the left.In particular, Julia x.[1:2, ^1:2, 1:2]
could either mean "zip the first and third indices together" like NumPy x[[0, 1], 0:2, [0, 1]]
or "zip the second and third indices together" like NumPy x[0:2, [0, 1], [0, 1]]
. In the second case, which I think is closer to the intent of the proposal, it's not clear how to spell the other semantics (in the first case it would be x.[^1:2, 1:2, 1:2]
).
Note that this might be okay, e.g. I think I'm fine with saying that if you really need to zip together nonadjacent nonscalar indices you should permutedims
first, but also I'd be fine with the ^
behavior following NumPy even if it's a little harder to explain.
(I also don't think it's a good idea to deprecate nonscalar A[...]
, and I don't see fusion potential as a strong long-term basis for making syntax changes because I trust that a sufficiently capable compiler will eventually get that stuff figured out.)
because I trust that a sufficiently capable compiler will eventually get that stuff figured out.
Famous last words 😁
While we are "pie-in-the-sky brainstorming":
Much of the complication here comes from multidimensionality... in the purely 1D (Vector
) case I think maybe the distinction between scalar and non-scalar indexing is clear, and the four types of "indexed assignment" mention in the OP seem complete.
Thus, I wonder if we should think of marshalling multi-dimensional indexing into an appropriate "single dimensional" form. For example, the APL rules are stated to give "the Cartesian product of the input indices" so let's use precisely this. Instead of the current non-scalar matrix indexing
matrix[1:2, 3:4]
I'm thinking of something like
matrix.[1:2 ⊗ 3:4]
where ⊗
is similar to product
and creates things like Cartesian ranges (but not just ranges).
We could keep the surface-level syntax of scalar multidimensional indexing exactly as is, so that matrix[1, 2]
still works, but we could lower this to getindex(matrix, (1, 2))
or getindex(matrix, CartesianIndex(1, 2))
or something along those lines. Another place I feel this helps is with zero-dimensional arrays and references. I feel ref[]
lowering to getindex(ref, ())
along with first(keys(ref)) === ()
is better than Ref
s having indexing but no keys
at all.
While we are at it, we could even think about distinguishing the syntax x[a]
from x[a,]
(the former being getindex(x, a)
and the latter being getindex(x, (a,))
). To me this feels analogous to the syntax for tuples ()
, (a,)
, (a,b)
and the non-tuple (a)
. We could overload getindex(v::AbstractVector, i::Integer) = v[i,]
and remove the different behavior of keys
for 1D arrays compared to every other dimension. (That might be the only special thing we need for AbstractVector
distinct from AbstractArray
).
I've been thinking of this stuff because I also want the concept of multi-dimensional dictionaries, and this is the best I've come up with so far. I think it makes the concepts of broadcasting and multi-dimensional indexing more orthogonal, thus they can compose together more cleanly with each other as well as other concepts.
I forgot to mention another thought - I wonder if x.[a]
should maybe not support logical indexing. We can still kind-of support logical indexing since it means x.[findall(a)]
, but it would be nice to support something shorter like the current syntax x[x .> 0]
for collecting all positive elements of x
. I feel the appropriate approach is more "let's make an awesome filtering syntax" rather than "let's complicate indexing with yet another orthogonal concept".
This came up when decompressing a sparse matrix. https://github.com/JuliaDiffEq/SparseDiffTools.jl/pull/38#discussion_r308715915 .
rows_index, cols_index, val = findnz(J)
J[rows_index, cols_index] .+= (color[cols_index] .== color_i) .* dx[rows_index]
would be natural, but instead I'm left with:
@.. setindex!((J,),getindex((J,),rows_index, cols_index) + (getindex((color,),cols_index) == color_i) * getindex((dx,),rows_index),rows_index, cols_index)
which is quite nasty. This issue is starting to come up a lot with GPU parallelism asking for things to be vectorized, but sparse matrix calculations require logical or index-based indexing, so broadcasting this and things like x.[x .> 0]
is starting to be very natural.
Anyways, any meaning other than getindex.((A,),idxs)
confuses me. 🤷♂
Consensus seems to be that the getindex.((A,), idxs)
meanings is more intuitive and useful.
I feel the appropriate approach is more "let's make an awesome filtering syntax" rather than "let's complicate indexing with yet another orthogonal concept".
I agree this in general. But let me try the opposite (sorry!), to see how far I can get.
It may already be obvious for people here but I just want to note that logical indexing can be implemented in terms of lowering to getindex.(Ref(A), idx)
, just by using Broadcast
machinery that is already in place. Here is an implementation of a lazy fused version of x[x .> 0] .= 0
:
struct LogicalIndex{T}
select::Bool
index::T
end
struct LogicalIndexer{T} # wraps a broadcastable object (with Bool eltype)
bc::T
end
function If end
Broadcast.broadcasted(::typeof(If), x) = LogicalIndexer(x)
# Forward some interfaces to `.bc`:
Broadcast.instantiate(li::LogicalIndexer) =
LogicalIndexer(Broadcast.instantiate(li.bc))
Broadcast.BroadcastStyle(::Type{<:LogicalIndexer{T}}) where T =
Broadcast.BroadcastStyle(T)
Broadcast.combine_axes(li::LogicalIndexer) = Broadcast.combine_axes(li.bc)
Broadcast.broadcastable(li::LogicalIndexer) = li
Base.size(li::LogicalIndexer) = size(li.bc)
Base.getindex(li::LogicalIndexer, I...) = LogicalIndex(li.bc[I...], I)
Base.setindex!(arr::AbstractArray, value, idx::LogicalIndex) =
if idx.select
arr[idx.index...] = value
end
x = [-1, 0, 1]
setindex!.(Ref(x), 0, If.(x .> 0)) # x.[If.(x .> 0)] .= 0 (kind of)
@show x
One disadvantage is that, if we define this for AbstractDict
as well, we can't use LogicalIndex
as a key. Users can avoid this by using Some(actual_key)
as a key but this is a bit awkward solution. Another solution is to use dotgetindex
and dotsetindex!
inside broadcasting context (like dotview
).
We can also lower if f.(args...)
to If.(f.(args...))
(similar to the syntax I implemented in #31553) so that we can write
x.[if x .> 0] .= 0
Handling the case that x.[if bools]
appears in the right hand side is a bit more challenging. But I think this only makes sense in the reduction context (e.g., sum(for x.[if x .> 0])
) because the shape of x.[if bools]
cannot be predicted until evaluating everything (does anyone write y .= x[f.(z)]
a lot?). If we focus on fusing mapping, filtering, and reduction, it's not super difficult to do it. I think we need something like
abstract type MaybeSelected end
struct Selected{T} <: MaybeSelected
value::T
end
struct Masked <: MaybeSelected end
Base.getindex(arr, idx::LogicalIndex) =
idx.select ? Selected(arr[idx.index]) : Masked()
(this is untested) and automatically lift functions to f(::Vararg{MaybeSelected}) :: MaybeSelected
. Reductions like sum
then can automatically filter out Maksed
and unwrap Selected
.
Zygote.jl
differentiates for
loops much slower than broadcasting, so I'm constantly finding myself having to rewrite simple and clear expressions like
flipped_xs = Buffer(xs)
for t ∈ reverse(eachindex(xs))
flipped_xs[t] = f(xs[t])
end
to obscure broadcasted form:
rev_time = reverse(eachindex(xs))
getindex.(Ref(f.(getindex.(Ref(xs), rev_time))), rev_time)
for performance reasons. Being able to write those using something like xs.[rev_time]
would greatly improve this experience and the code would not look so ugly. For example, the broadcasted version above could look like
rev_time = reverse(eachindex(xs))
(f.(xs.[rev_time])).[rev_time]
I will paste below examples from a real project that uses Flux.jl
and Zygote.jl
, where I'm forced to use explicit getindex
, setindex!
and view
s with broadcasting
for performance reasons, so that it can inform the decision on this issue.
function flip(f, xs)
rev_time = reverse(eachindex(xs))
return getindex.(Ref(f.(getindex.(Ref(xs), rev_time))), rev_time)
# the same as
# flipped_xs = Buffer(xs)
# @inbounds for t ∈ rev_time
# flipped_xs[t] = f(xs[t])
# end
# return copy(flipped_xs)
# but implemented via broadcasting as Zygote differentiates loops much slower than broadcasting
end
function (m::BLSTM)(Xs::T₃)::T₃ where T₃ <: DenseArray{<:Real,3}
# preallocate output buffer
Ys = Buffer(Xs, 2m.dim_out, size(Xs,2), size(Xs,3))
axisYs₁ = axes(Ys, 1)
time = axes(Ys, 2)
rev_time = reverse(time)
@inbounds begin
# get forward and backward slice indices
slice_f = axisYs₁[1:m.dim_out]
slice_b = axisYs₁[(m.dim_out+1):end]
# bidirectional run step
setindex!.(Ref(Ys), m.forward.(view.(Ref(Xs), :, time, :)), Ref(slice_f), time, :)
setindex!.(Ref(Ys), m.backward.(view.(Ref(Xs), :, rev_time, :)), Ref(slice_b), rev_time, :)
# the same as
# @views for (t_f, t_b) ∈ zip(time, rev_time)
# Ys[slice_f, t_f, :] = m.forward(Xs[:, t_f, :])
# Ys[slice_b, t_b, :] = m.backward(Xs[:, t_b, :])
# end
# but implemented via broadcasting as Zygote differentiates loops much slower than broadcasting
end
return copy(Ys)
end
function (m::PBLSTM)(xs::VM)::VM where VM <: DenseVector
# reduce time duration by half by restacking consecutive pairs of input along the time dimension
xxs = vcat.(getindex.(Ref(xs), 1:2:lastindex(xs)), getindex.(Ref(xs), 2:2:lastindex(xs)))
# the same as
# @views xxs = @inbounds(vcat.(xs[1:2:end], xs[2:2:end]))
# but implemented via broadcasting for performance reasons
return vcat.(m.forward.(xxs), flip(m.backward, xxs))
end
Hs_buffer = Buffer(h, size(h,1), batch_size, length(hs))
setindex!.(Ref(Hs_buffer), hs, :, :, axes(Hs_buffer, 3))
# the same as
# for k ∈ eachindex(hs)
# Hs_buffer[:,:,k] = hs[k]
# end
# but implemented via broadcasting as Zygote differentiates loops much slower than broadcasting
Hs = copy(Hs_buffer)
I haven't spent more than 2 seconds thinking about this, but isn't fixing Zygote the best approach?
It's not just Zygote that depends upon this — GPU vectorization also heavily depends upon broadcasting.
Is it mostly a missing hoisting optimization? Why doesn't LICM do this for us?
I think it is just easier to overload and know what a broadcast expression will do than to try pattern match a loop. A loop is a pretty low level abstraction compared to broadcasting.
Fair enough. But it would be giving up a lot to resign to not having fast loops anymore.
I don't doubt it would be hard, but I wonder if having our own LICM pass that runs before inlining is something we'll eventually decide we need. LLVM's pattern matcher might be specialized for what it thinks of as a number, and Julia's notion of Number
is rather more expansive. Presumably much of the performance gap comes from repeated dual-number operations on hoistable quantitites.
I instantly fell in love with the suggestion of making it explicit by @andyferris (https://github.com/JuliaLang/julia/issues/30845#issuecomment-475865027). The only missing thing would be an ascii variant which is easy enough to type and remember that no one will complain about dropping APL style at some time. I mostly like that approach out of the same reason why I like broadcasting, CartesianIndex/Indices and the whole promotion/conversion mechanism. Make the behaviour more explicit while increasing elegance. Have code working by default due to clever mechanics. But I also support @timholy (https://github.com/JuliaLang/julia/issues/30845#issuecomment-458370787) and even go further by saying that we should think about restructuring the whole indexing machinery for further extensibility and clarity. (And thus allowing to test new indexing schemes)
I suggest to remove much of the index handling from getindex/setindex!
into some sort of IndexBuilder and then call arrays with a built index. Currently we already have the to_indices
functionality (which is definitively missing a reference in the Interfaces/Indexing section) which handles boolean arrays. So we might be able to generalize that approach if we turned indexing into another trait- or inheritance based construct. My aim would be to only have to define 1-arg getindex
functions with explicit scalar indexing types and to move index building and transforming to a new type.
That hopefully allows for orthogonalization of indexing type/style, indexing arguments and indexed array and also might be able to lift some of the transforms to compile time. It on the other hand can lead to more lazy evaluations and to easily unify/extract the behaviour of indexing on LHS and RHS.
Define #1
or #2
based on IndexStyle
getindex(A::AbstractArray, ::LinearIndex) #1 if IndexStyle==IndexLinear
getindex(A::AbstractArray, ::CartesianIndex) #2 if IndexStyle==IndexCartesian
getindex(A::AbstractArray, ii::IterableIndex) = collect(idx->A[idx], ii) #3 default
getindex(A::AbstractArray, specialized::IndexBuilder) = getindex(A, build(IndexStyle(A), A, specialized))) #4 default
getindex(A, args...) = getindex(A, DefaultBuilder(A)(args...)) #redirecting fallback or via lowering
DefaultBuilder(A::AbstractArray) = tuple #current default/fallback
You may define #3
if you want to shortcut some paths for performance. #4
usually shouldn't be redefined. If #3
was solved via broadcasting, it would even result in a fused loop after inlining!
Define #5
or #6
based on granularity and the ability to order collections for performance
build(A::AbstractArray, ib::IndexBuilder)::IndexType #5
build(s::IndexStyle, A::AbstractArray, ib::IndexBuilder)::IndexType(s) = convert(s, build(A, ib)) #6 default
build(A::AbstractArray, t::Tuple) = to_indices(A, t) #current default/fallback
Those, who know the fallback behaviour of getindex
, know, that we currently have almost exactly that behaviour.
Have I ⊗ J
return APL(I,J) to get current behaviour.
Have A[APL(1:10, 1:10)]
to mirror the current A[1:10, 1:10]
Have A[BC(1:10, 1:10)]
to mirror the suggested A.[1:10, 1:10]
(compare the dot after A)
All those rewrites of the []
could be managed by macros to allow for smoother writing.
FYI, there were some discussions around awkward-array https://github.com/scikit-hep/awkward-array in https://github.com/andyferris/Dictionaries.jl/issues/19#issuecomment-641524140 with me and @andyferris. It is a Python library for nested arrays and tables. If you are curious, I think this Strange Loop talk "Jagged, ragged, awkward arrays" by Jim Pivarski - YouTube is a great introduction to it. In particular, it also touches on the growing need and attention of such data structure in technical computing. I wonder if nonscalar indexing syntax A.[I]
can be used as a building block for this. I'd imagine something like x.[i].[j].[k]
can be used for dealing with nested structure.
@andyferris Re https://github.com/JuliaLang/julia/issues/36105#issuecomment-645039761
For nested things @tkf I wonder if we could using nested dot-getindices operations to detangle it on the LHS, as it seems (to me) you might need to do anyway on the RHS?
A.[2:4].[end] .= B.[1:3].[begin]
(I'm not sure if that particular example makes sense, but something like that)
I'd imagine we need to call a strictly lazy version of broadcastable
on the LHS (e.g., don't call collect
in it) and build the Broadcasted
. This way, in-place mutation on the leaves of the LHS would be reflected in A
. So, we'd have the trees of Broadcasted
for both LHS and RHS. Is this what you mean by detangling?
Is this what you mean
Yes, I think that sounds right.
So with broadcasting dots, we've managed to excise nearly all sometimes-scalar/sometimes-vectorized functions in Julia to great benefit. There are two notable stragglers:
getindex
andsetindex!
. Now, since they have their own syntaxes and support so many fancy things it hasn't always been (and really still isn't) completely obvious that it should be deprecated in favor of broadcasting. But they really are just "broadcasting" scalar indexing over the cartesian product of the indices.The advantages
There'd be a number of advantages to spelling nonscalar indexing as some form of broadcasting. First and foremost: fusion. Forget the whole view/copy debate — straight-up fusion could be faster than either. For example, we could define the new syntax
A.[I...]
to meanbroadcast((idxs...)->A[idxs...], I...)
. Then, to extract the first 10 elements from the first column ofA
, you writeA.[1:10, 1]
. This means that you could fuse in additional operations without any temporaries or views whatsoever with, e.g.,sqrt.(A.[1:10, 1])
.The other huge advantage is how this would then generalize these sorts of accesses to all data structures, including dictionaries and more. Cf. #24019.
The challenges and their potential solutions
Now, there are also some fairly major challenges in making it easy to express all that indexing does with a broadcasting semantic.
A[1:10, 1:10]
takes the cartesian product of the two indices, returning a 2-dimensional 10x10 array. I'd expect a broadcastedA.[1:10, 1:10]
to use broadcasting's shape-matching semantic and return the diagonal. In general, APL indexing can be thought of as a broadcast where each successive index is "lifted" to a higher dimension than the sum of dimensionalities of all arguments that preceded it. We could potentially have a simple wrapper that flags arguments to be "orthogonalized" before participating in broadcast — options for spelling this wrapper could include⟂
or^
. Thus,A[I, J]
becomesA.[I, ^J]
. This is a generally useful operation... and for my purposes would completely obviate the pain from the recursive transpose/adjoint fallback removal as you could lift arguments to orthogonal dimensions anywhere:f.(1:10, ^array_of_strings)
. That said, the change in default operation here from APL-like to broadcasting-like may prove to be very painful...Index conversions: The existing indexing API supports many types of indices and even an extensible interface for adding more.
:
is actually a function, but when used as an index it expands out to the entire axis. With broadcast, however, it acts like a function — as a scalar.Resolving this one means unfortunately giving up something fairly major: I don't believe it'll ever be possible to support all these features and also support broadcast fusion through to an index computation. So while it would be oh-so-cool to have an expression like
A.[clamp.(idx .+ (-N:N), 1, end)].^2
(to examine a window about someidx
without worrying about bounds) fuse the whole way down, it simply isn't extensible to the very next thing you'd want to have fuse:A.[A .> 0]
. That would be a nightmare to figure out how to fuse.:
(no checks),1:10
(just check endpoints), and logical masks (just check the shape). I think this may be possible to still do — again, if we don't fuse through to the index computations. But I've not completely seen to the end of this tunnel.similar
’s first major use was in defining the output type for nonscalar indexing. Lots of folks specialized it for that reason, and broadcast doesn’t use it.@.
macro. That currently leaves indexing expressions alone, but once we introduce theA.[]
operator, I'd expect it to dot those bracket operations. It seems like we have sufficient leeway in the macro to do some fancy detection, though, allowing us to insert appropriate handling code if any arguments are arrays and inserting deprecations appropriately.setindex!
, too. It actually becomes quite the advantage as the dots can tell the whole story about what's scalar and whats not right where you'd expect (adapted from the sister issue #24086):A[i] = x
: Always setsx
at the scalar indexi
.A[i] .= X
: Always broadcastsX
across the element at locationi
(always mutating the object inA[i]
).A.[I] = x
: Assignsx
to every index inI
.A.[I] .= X
: Always broadcastsX
across the indices selected byI
(always mutatingA
).What to do:
So, bringing this all together, I propose we aim to:
A.[I, J, K]
to mean:The separation into two statements there is meaningful — that's how it'll behave in fusion. Note, too, that this defaults to behaving broadcast-like (and not APL-like). I'm still not entirely sold on it, but I do think it'll unify the language and simplify things. I think this is one of those things we'll have to see how it feels in practice.
f.(a, ^b, c, ^d)
to mean:This may pose some constant-propagation and type-stability challenges, but I hope they're overcome-able.
References and prior discussions
This issue brings together lots of iterations of thought that have been spread about through many issues and discourse communications.
getindex
as compared toview
(which is always nonscalar).