What to do about nonscalar indexing?

mbauman commented 5 years ago

So with broadcasting dots, we've managed to excise nearly all sometimes-scalar/sometimes-vectorized functions in Julia to great benefit. There are two notable stragglers: getindex and setindex!. Now, since they have their own syntaxes and support so many fancy things it hasn't always been (and really still isn't) completely obvious that it should be deprecated in favor of broadcasting. But they really are just "broadcasting" scalar indexing over the cartesian product of the indices.

The advantages

There'd be a number of advantages to spelling nonscalar indexing as some form of broadcasting. First and foremost: fusion. Forget the whole view/copy debate — straight-up fusion could be faster than either. For example, we could define the new syntax A.[I...] to mean broadcast((idxs...)->A[idxs...], I...). Then, to extract the first 10 elements from the first column of A, you write A.[1:10, 1]. This means that you could fuse in additional operations without any temporaries or views whatsoever with, e.g., sqrt.(A.[1:10, 1]).

The other huge advantage is how this would then generalize these sorts of accesses to all data structures, including dictionaries and more. Cf. #24019.

The challenges and their potential solutions

Now, there are also some fairly major challenges in making it easy to express all that indexing does with a broadcasting semantic.

APL Indexing: The existing A[1:10, 1:10] takes the cartesian product of the two indices, returning a 2-dimensional 10x10 array. I'd expect a broadcasted A.[1:10, 1:10] to use broadcasting's shape-matching semantic and return the diagonal. In general, APL indexing can be thought of as a broadcast where each successive index is "lifted" to a higher dimension than the sum of dimensionalities of all arguments that preceded it. We could potentially have a simple wrapper that flags arguments to be "orthogonalized" before participating in broadcast — options for spelling this wrapper could include ⟂ or ^. Thus, A[I, J] becomes A.[I, ^J]. This is a generally useful operation... and for my purposes would completely obviate the pain from the recursive transpose/adjoint fallback removal as you could lift arguments to orthogonal dimensions anywhere: f.(1:10, ^array_of_strings). That said, the change in default operation here from APL-like to broadcasting-like may prove to be very painful...
Index conversions: The existing indexing API supports many types of indices and even an extensible interface for adding more.
- For example, : is actually a function, but when used as an index it expands out to the entire axis. With broadcast, however, it acts like a function — as a scalar.
- Logical indexing with boolean arrays is even worse: there we have an array of trues and falses, but when used as an index it expands out to a vector of the locations of the trues. With broadcast, it behaves just like an array would, yielding the same number of trues and falses as the arrays overall shape.
Resolving this one means unfortunately giving up something fairly major: I don't believe it'll ever be possible to support all these features and also support broadcast fusion through to an index computation. So while it would be oh-so-cool to have an expression like A.[clamp.(idx .+ (-N:N), 1, end)].^2 (to examine a window about some idx without worrying about bounds) fuse the whole way down, it simply isn't extensible to the very next thing you'd want to have fuse: A.[A .> 0]. That would be a nightmare to figure out how to fuse.
Bounds checking: In comparison to the other issues, this one feels much simpler, but it's still gonna be a bit of a pain. Non-scalar indexing is able to perform bounds checking at the level of the collections of indices and then perform the scalar indexing with bounds checks off. This is a huge win for things like : (no checks), 1:10 (just check endpoints), and logical masks (just check the shape). I think this may be possible to still do — again, if we don't fuse through to the index computations. But I've not completely seen to the end of this tunnel.
Returned array type: We’ll need to find a solution to #17533; similar’s first major use was in defining the output type for nonscalar indexing. Lots of folks specialized it for that reason, and broadcast doesn’t use it.
Backwards compatibility: For the most part, this is entirely a new syntax with a straight-forward deprecation. There's one place where I was worried about a clash, though: the @. macro. That currently leaves indexing expressions alone, but once we introduce the A.[] operator, I'd expect it to dot those bracket operations. It seems like we have sufficient leeway in the macro to do some fancy detection, though, allowing us to insert appropriate handling code if any arguments are arrays and inserting deprecations appropriately.
Indexed Assignment: So far I've really only talked about getindex, but all the same applies to setindex!, too. It actually becomes quite the advantage as the dots can tell the whole story about what's scalar and whats not right where you'd expect (adapted from the sister issue #24086):
- A[i] = x: Always sets x at the scalar index i.
- A[i] .= X: Always broadcasts X across the element at location i (always mutating the object in A[i]).
- A.[I] = x: Assigns x to every index in I.
- A.[I] .= X: Always broadcasts X across the indices selected by I (always mutating A).

What to do:

So, bringing this all together, I propose we aim to:

Introduce the syntax A.[I, J, K] to mean:
```
idxs = to_indices(A, (I, J, K))
broadcast((i, j, k)->A[i,j,k], idxs...)...)
```
The separation into two statements there is meaningful — that's how it'll behave in fusion. Note, too, that this defaults to behaving broadcast-like (and not APL-like). I'm still not entirely sold on it, but I do think it'll unify the language and simplify things. I think this is one of those things we'll have to see how it feels in practice.
Introduce the new "orthogonalization" syntax f.(a, ^b, c, ^d) to mean:
```
f.(a, Lifted(b, ndims(a)), c, Lifted(d, ndims(a) + ndims(Lifted(b, ndims(a))) + ndims(c))
```
This may pose some constant-propagation and type-stability challenges, but I hope they're overcome-able.
Apply these semantics to indexed assignments over in #24086.

References and prior discussions

This issue brings together lots of iterations of thought that have been spread about through many issues and discourse communications.

What to do about multivalue setindex!? (#24086)
Broadcast Array Indexing (#19169): discusses if the array itself should participate in the broadcasting (for indexing into many arrays inside an array). Discourse: Square Bracket Notation for Broadcasting getindex Across Array of Vectors? requests the same thing.
Make indexing expressions participate in dot syntax fusion (#22858): started sketching out lots of this, but defaulted to APL indexing semantics. My last comment is fairly negative because lifting arguments to orthogonalize them requires array wrappers — and array wrappers will allocate just like views will. But this omits the fact that many indices won't be heap-allocated (like ranges) and defaulting to broadcasting semantics will further limit the damage here. Upon further reflection I think it's a win.
Broadcasting pointwise indexing operator (#2591) is one of the first discussions on this topic.
Discourse: understanding view: A discussion about the strange-ness of the sometimes-scalar sometimes-not behaviors of getindex as compared to view (which is always nonscalar).

andreasnoack commented 5 years ago

Very nice write-up. Just thinking out loud

Wouldn't multidimensional slicing just correspond to broadcasting along different directions, i.e. something like

julia> getindex.(Ref(randn(4,4)), 1:4, (1:4)')
4×4 Array{Float64,2}:
0.698382  -0.450998   0.00277274   0.265284
1.39098   -0.274713   0.947628     0.853339
0.232767   0.800546   1.53554      1.21039
1.46495   -1.86763   -1.17628     -0.228153

Since we already have special syntax for indexing, it might be possible retain some of the current behavior while internally handling everything with broadcasting by changing lowering such that getindex is only defined for scalars but A[1:2, 1:3, 1:4] would lower to something like getindex.(Ref(A), reshape(1:2, 2, 1, 1), reshape(1:3, 1, 3, 1), reshape(1:4, 1, 1, 4)).

tkf commented 5 years ago

It is a very interesting read, thank you.

Thus, A[I, J] becomes A.[I, ^J].

Does it mean ^ is an operator to "add a singleton dimension", i.e, A.[I, ^J, ^^K] with I, J, K are all Vector{Int} is equivalent to A[I, J, K]? Also, I wonder if it makes sense to allow inserting a singleton dimension at arbitrary position. For example, A[., ^, ...] to do reshape(A, size(A, 1), 1, size(A)[2:end]...) lazily. But A[^, ...] is currently a valid syntax so it is (slightly?) breaking.

an expression like A[clamp.(idx .+ (-N:N), 1, end)].^2

Nit picking, but did you mean A.[clamp.(idx .+ (-N:N), 1, end)].^2 (a dot after A)?

the very next thing you'd want to have fuse: A.[A .> 0]

I'm guessing it's equivalent to getindex.(Ref(A), A .> 0). But then wouldn't it cause an error unless A[true] and A[false] are defined (if A is a vector)? Also, the result would be different from A[A .> 0], I suppose? For fusing filtering and reduction, I imagine @lazy or some additional (succinct, I hope) syntax for non-materializing broadcasting as in #19198 would be simpler.

That would be a nightmare to figure out how to fuse.

I cannot help thinking that transducer (sorry, that's my library) is the right approach for fusing mapping, filtering and reduction. Currently it can fuse operations like B[f.(A)] .= g.(A[f.(A)]) and foldl(h, g.(A[f.(A)])) for vectors (where f(::eltype(A)) :: Bool). But I haven't figured out what's the best way to do it in multidimensional case yet.

mkborregaard commented 5 years ago

Awesome writeup. Detail - would ^ just be interpreted as a lazy permutedims, essentially a ' operator that has been fixed to not be recursive and thus not have the problems of mystringvector'? That might be generally useful outside the world of getindex as well.

mbauman commented 5 years ago

Thanks for the comments.

On orthogonal slicing:

Wouldn't multidimensional slicing just correspond to broadcasting along different directions

Yes, exactly. That's my thought behind the ^ syntax — make it easy to "lift" those arguments to the new dimensions.

Since we already have special syntax for indexing, it might be possible retain some of the current behavior while internally handling everything with broadcasting by changing lowering such that getindex is only defined for scalars but A[1:2, 1:3, 1:4] would lower to something like getindex.(Ref(A), reshape(1:2, 2, 1, 1), reshape(1:3, 1, 3, 1), reshape(1:4, 1, 1, 4)).

This is different in two ways from my proposal:

It would keep nonscalar indexing APL-like (or orthogonal). I don't particularly like this for two reasons:
- The broadcasting behavior is more general — to get the diagonal out of an array (without using linear indexing), you currently need to construct an intermediate array of CartesianIndex with A[CartesianIndex.(axes(A)...)]. Broadcasting allows the user to choose how they'd like their indices to combine.
- Broadcasting is everywhere in this language now. I think the combination of APL indexing and broadcasting is actually rather confusing. We just teach them as wholly unrelated concepts, but if you try to think about both simultaneously it's quite a challenge. I like the unification this proposal would bring.
It would mean that all A[x] expressions would lower to a 0-d broadcast — even for scalar indexing. Given that scalar indexing is such a core construct in this language, that concerns me. Even if we got the run-time performance to complete parity, it'd be quite the compile-time burden.

Does it mean ^ is an operator to "add a singleton dimension", i.e, A.[I, ^J, ^^K] with I, J, K are all Vector{Int} is equivalent to A[I, J, K]?

No, my thought is that this is a parser-level transform that automagically will add the correct number of singleton dimensions such that the flagged argument is in a higher dimension than all previous ones. So it's not just adding one singleton dimension — it'd be A.[I, ^J, ^K] to be equivalent to the old A[I, J, K], no matter what I, J, and K are. Remember that I might be a matrix — which would add an extra dimension there, so J would need to have two leading singleton dimensions and K would need two + however many J had.

There would be something nice about it simply meaning add one singleton dimension — it's simple and straightforward, and it'd allow you to do things like f.(^A, B) to put A along the columns and B along the rows. It'd be a slightly more challenging upgrade path as — given a nonscalar indexing expression A[I, J], you don't know how many ^ to put in front of J to make it orthogonal to I without referencing I and making a mess of the expression.

And yes, supporting ^ generally would completely obviate all my issues with the removal of the recursive transpose — that's the main reason I ever transposed arrays of strings.

On fusing into indices and logical indexing

Yeah, logical indexing is something that just "feels" like it'd be nice to fuse and avoid the allocation of the intermediate logical array. The point is somewhat moot because the same problem also applies to : and Not and any other index conversion. But it is a strange bird — indexing by arrays means indexing by the elements unless the element type of the array is Bool, in which case it does something completely different. There's an additional challenge in that logical indices get transformed into an intermediate lazy object that only supports iteration (and not indexing). Too bad it's so darn useful.

willow-ahrens commented 5 years ago

On the last point: If logical indices always get transformed into intermediate lazy iterables, it seems like little would be lost when one converts the current syntax for logical indexing(A[B]) to the new syntax with a findall and a broadcasted getindex (A.[findall[B]]). While the new syntax is more verbose, it allows the same functionality and looks more like what was meant in the first place. I like the more consistent rules for getindex.

andreasnoack commented 5 years ago

It would keep nonscalar indexing APL-like

That was indeed my purpose. I do like APL-indexing.

Broadcasting is everywhere in this language now. I think the combination of APL indexing and broadcasting is actually rather confusing. We just teach them as wholly unrelated concepts, but if you try to think about both simultaneously it's quite a challenge.

You might be right but I'm not sure it will be confusing. The two concepts will have separate syntax ([ ] vs .) and it might be possible to have a clear translation from APL-indexing to broadcasting which could make it possible to reason about APL-indexing in a broadcast context.

StefanKarpinski commented 5 years ago

Regarding syntax, I kind of like the idea of the number of ^ prefixes indicating "lift levels", i.e. A.[I, ^J, ^^K] would be equivalent of A[I, J, K] but you could also do A.[^I, J, ^K] and the J indices would end up first in the output while the I and K indices would come next, broadcast with each other. In that case I'm not sure what A.[^I, ^J] would mean—either the same as A.[I, J]or an error? That seems both slightly more intuitive and slightly more general.

mbauman commented 5 years ago

If ^ simply means add a leading singleton dimension, then A.[^I, ^J] would mean ^A.[I, J] — that is, both I and J are used simultaneously to populate the columns. Or in other words, A.[reshape(I, 1, :), reshape(J, 1, :)].

Another important consideration: how would view/@view/@views change?

willow-ahrens commented 5 years ago

Would there be a large difference between broadcasted(getindex, Ref(A), map(Lifted, inds)...)and view(A, inds...)?

StefanKarpinski commented 5 years ago

No, that's not what I'm proposing: what I'm proposing is that the number of prefix ^ defines orthogonal groups of indices. So consider for example A.[^I, J, ^^K, ^L]. This has three different orthogonal groups of indices: no carets—J; one caret—I and L; two carets—K. The groups of indices appear in the output in the order of the number of carets so the J indices first, then the I/L indices, then the K indices. Since the I and L indices are in the same group, they would be broadcast together.

timholy commented 5 years ago

Interesting stuff. I'm noticing that almost all the pain comes from transitioning the meaning of []. At least for the short term (and perhaps permanently), it seems that almost all the pain points go away if we introduce separate syntax (where the macro is achievable with @\cdot\tab):

julia> @generated function indextuple(A::AbstractArray{T,N}) where {T,N}
           idxs = [gensym(:i) for i = 1:N]
           quote
               $idxs
           end
       end
indextuple (generic function with 1 method)

julia> A = [1 2; 3 4]
2×2 Array{Int64,2}:
 1  2
 3  4

julia> macro ⋅(A)
           f = gensym(A)
           quote
               let A = $(esc(A))
                   idxs = indextuple(A)
                   f($(idxs...)) = A[$(idxs...)]
               end
           end
       end
@⋅ (macro with 1 method)

julia> i, j = 1:2, 1:2
(1:2, 1:2)

julia> (@⋅A).(i, j) .* i .* j'
2×2 Array{Int64,2}:
 1   2
 8  16

julia> j = [2,1]
2-element Array{Int64,1}:
 2
 1

julia> (@⋅A).(i, j) .* i .* j'
2×2 Array{Int64,2}:
  4  2
 12  6

mbauman commented 5 years ago

@StefanKarpinski Ooooh, interesting! That's very clever and creative, but I'm not sure that folks index with matrices enough for the additional complexity there to be worthwhile. I think I'd prefer to try the simpler rule that each ^ adds one leading singleton dimension first.

@peterahrens On views I don't have a firm enough sense of what Lifted means to answer your question, but yes, views are currently orthogonal, but if we make .[] non-orthogonal then they'd no longer match for macros like @views.

@timholy The A.[...] syntax is available and we can make it mean whatever we want... and we needn't necessarily deprecate nonscalar A[...]. Still at the early stages of pie-in-the-sky brainstorming at the moment.

jekbradbury commented 5 years ago

Some thoughts from a Python/NumPy perspective (tl;dr there's a potential ambiguity, but I'd be fine with it being resolved either possible way):

Nonscalar A[...] is a venerable tradition from APL through NumPy that's familiar to most potential switchers-to-Julia and has semantics that are fairly easy to internalize ("the shape of the output is the Cartesian product of the shape of the nonscalar indices").
One place it falls short, though (specifically in terms of representational power) is that it can't represent the case where we want two or more nonscalar indices to "zip" together ("parallel" broadcast, resulting in a single dimension in the output) rather than forming separate dimensions with an "orthogonal" broadcast.
NumPy handles this case by distinguishing between slice indices (which behave like nonscalar indices in Julia) and array indices (which always zip together with each other):
```
>>> x = np.array([[1, 2, 3], [4, 5, 6]])
>>> x
array([[1, 2, 3],
   [4, 5, 6]])
>>> x[[0, 1], [0, 1]]
array([1, 5])
>>> x[0:2, 0:2]
array([[1, 2],
   [4, 5]])
```
This isn't available in Julia because a) it's terrible and b) ranges and arrays with the same values compare equal and should be as indistinguishable as possible.
The availability of A.[...] together with the natural interpretation of zipped nonscalar indexing as a broadcasting operation suggests that the all-parallel case should be spelled with a dot.
The question is now how to spell intermediate cases. Unary ^ on top of the dotted spelling sounds pretty good. But there's a potential divergence from NumPy, because they put the indices into two global categories (parallel and orthogonal) while this proposal appears to make a local determination relative to indices to the left.

In particular, Julia x.[1:2, ^1:2, 1:2] could either mean "zip the first and third indices together" like NumPy x[[0, 1], 0:2, [0, 1]] or "zip the second and third indices together" like NumPy x[0:2, [0, 1], [0, 1]]. In the second case, which I think is closer to the intent of the proposal, it's not clear how to spell the other semantics (in the first case it would be x.[^1:2, 1:2, 1:2]).

Note that this might be okay, e.g. I think I'm fine with saying that if you really need to zip together nonadjacent nonscalar indices you should permutedims first, but also I'd be fine with the ^ behavior following NumPy even if it's a little harder to explain.

(I also don't think it's a good idea to deprecate nonscalar A[...], and I don't see fusion potential as a strong long-term basis for making syntax changes because I trust that a sufficiently capable compiler will eventually get that stuff figured out.)

StefanKarpinski commented 5 years ago

because I trust that a sufficiently capable compiler will eventually get that stuff figured out.

Famous last words 😁

andyferris commented 5 years ago

While we are "pie-in-the-sky brainstorming":

Much of the complication here comes from multidimensionality... in the purely 1D (Vector) case I think maybe the distinction between scalar and non-scalar indexing is clear, and the four types of "indexed assignment" mention in the OP seem complete.

Thus, I wonder if we should think of marshalling multi-dimensional indexing into an appropriate "single dimensional" form. For example, the APL rules are stated to give "the Cartesian product of the input indices" so let's use precisely this. Instead of the current non-scalar matrix indexing

matrix[1:2, 3:4]

I'm thinking of something like

matrix.[1:2 ⊗ 3:4]

where ⊗ is similar to product and creates things like Cartesian ranges (but not just ranges).

We could keep the surface-level syntax of scalar multidimensional indexing exactly as is, so that matrix[1, 2] still works, but we could lower this to getindex(matrix, (1, 2)) or getindex(matrix, CartesianIndex(1, 2)) or something along those lines. Another place I feel this helps is with zero-dimensional arrays and references. I feel ref[] lowering to getindex(ref, ()) along with first(keys(ref)) === () is better than Refs having indexing but no keys at all.

While we are at it, we could even think about distinguishing the syntax x[a] from x[a,] (the former being getindex(x, a) and the latter being getindex(x, (a,))). To me this feels analogous to the syntax for tuples (), (a,), (a,b) and the non-tuple (a). We could overload getindex(v::AbstractVector, i::Integer) = v[i,] and remove the different behavior of keys for 1D arrays compared to every other dimension. (That might be the only special thing we need for AbstractVector distinct from AbstractArray).

I've been thinking of this stuff because I also want the concept of multi-dimensional dictionaries, and this is the best I've come up with so far. I think it makes the concepts of broadcasting and multi-dimensional indexing more orthogonal, thus they can compose together more cleanly with each other as well as other concepts.

andyferris commented 5 years ago

I forgot to mention another thought - I wonder if x.[a] should maybe not support logical indexing. We can still kind-of support logical indexing since it means x.[findall(a)], but it would be nice to support something shorter like the current syntax x[x .> 0] for collecting all positive elements of x. I feel the appropriate approach is more "let's make an awesome filtering syntax" rather than "let's complicate indexing with yet another orthogonal concept".

ChrisRackauckas commented 5 years ago

This came up when decompressing a sparse matrix. https://github.com/JuliaDiffEq/SparseDiffTools.jl/pull/38#discussion_r308715915 .

rows_index, cols_index, val = findnz(J)
J[rows_index, cols_index] .+= (color[cols_index] .== color_i) .* dx[rows_index]

would be natural, but instead I'm left with:

@.. setindex!((J,),getindex((J,),rows_index, cols_index) + (getindex((color,),cols_index) == color_i) * getindex((dx,),rows_index),rows_index, cols_index)

which is quite nasty. This issue is starting to come up a lot with GPU parallelism asking for things to be vectorized, but sparse matrix calculations require logical or index-based indexing, so broadcasting this and things like x.[x .> 0] is starting to be very natural.

Anyways, any meaning other than getindex.((A,),idxs) confuses me. 🤷‍♂

StefanKarpinski commented 5 years ago

Consensus seems to be that the getindex.((A,), idxs) meanings is more intuitive and useful.

tkf commented 5 years ago

I feel the appropriate approach is more "let's make an awesome filtering syntax" rather than "let's complicate indexing with yet another orthogonal concept".

I agree this in general. But let me try the opposite (sorry!), to see how far I can get.

It may already be obvious for people here but I just want to note that logical indexing can be implemented in terms of lowering to getindex.(Ref(A), idx), just by using Broadcast machinery that is already in place. Here is an implementation of a lazy fused version of x[x .> 0] .= 0:

struct LogicalIndex{T}
    select::Bool
    index::T
end

struct LogicalIndexer{T}  # wraps a broadcastable object (with Bool eltype)
    bc::T
end

function If end

Broadcast.broadcasted(::typeof(If), x) = LogicalIndexer(x)

# Forward some interfaces to `.bc`:
Broadcast.instantiate(li::LogicalIndexer) =
    LogicalIndexer(Broadcast.instantiate(li.bc))
Broadcast.BroadcastStyle(::Type{<:LogicalIndexer{T}}) where T =
    Broadcast.BroadcastStyle(T)
Broadcast.combine_axes(li::LogicalIndexer) = Broadcast.combine_axes(li.bc)
Broadcast.broadcastable(li::LogicalIndexer) = li
Base.size(li::LogicalIndexer) = size(li.bc)

Base.getindex(li::LogicalIndexer, I...) = LogicalIndex(li.bc[I...], I)

Base.setindex!(arr::AbstractArray, value, idx::LogicalIndex) =
    if idx.select
        arr[idx.index...] = value
    end

x = [-1, 0, 1]
setindex!.(Ref(x), 0, If.(x .> 0))  # x.[If.(x .> 0)] .= 0 (kind of)
@show x

One disadvantage is that, if we define this for AbstractDict as well, we can't use LogicalIndex as a key. Users can avoid this by using Some(actual_key) as a key but this is a bit awkward solution. Another solution is to use dotgetindex and dotsetindex! inside broadcasting context (like dotview).

We can also lower if f.(args...) to If.(f.(args...)) (similar to the syntax I implemented in #31553) so that we can write

x.[if x .> 0] .= 0

Handling the case that x.[if bools] appears in the right hand side is a bit more challenging. But I think this only makes sense in the reduction context (e.g., sum(for x.[if x .> 0])) because the shape of x.[if bools] cannot be predicted until evaluating everything (does anyone write y .= x[f.(z)] a lot?). If we focus on fusing mapping, filtering, and reduction, it's not super difficult to do it. I think we need something like

abstract type MaybeSelected end

struct Selected{T} <: MaybeSelected
    value::T
end

struct Masked <: MaybeSelected end

Base.getindex(arr, idx::LogicalIndex) =
    idx.select ? Selected(arr[idx.index]) : Masked()

(this is untested) and automatically lift functions to f(::Vararg{MaybeSelected}) :: MaybeSelected. Reductions like sum then can automatically filter out Maksed and unwrap Selected.

AzamatB commented 4 years ago

Zygote.jl differentiates for loops much slower than broadcasting, so I'm constantly finding myself having to rewrite simple and clear expressions like

flipped_xs = Buffer(xs)
for t ∈ reverse(eachindex(xs))
   flipped_xs[t] = f(xs[t])
end

to obscure broadcasted form:

rev_time = reverse(eachindex(xs))
getindex.(Ref(f.(getindex.(Ref(xs), rev_time))), rev_time)

for performance reasons. Being able to write those using something like xs.[rev_time] would greatly improve this experience and the code would not look so ugly. For example, the broadcasted version above could look like

rev_time = reverse(eachindex(xs))
(f.(xs.[rev_time])).[rev_time]

I will paste below examples from a real project that uses Flux.jl and Zygote.jl, where I'm forced to use explicit getindex, setindex! and views with broadcasting for performance reasons, so that it can inform the decision on this issue.

function flip(f, xs)
   rev_time = reverse(eachindex(xs))
   return getindex.(Ref(f.(getindex.(Ref(xs), rev_time))), rev_time)
   # the same as
   # flipped_xs = Buffer(xs)
   # @inbounds for t ∈ rev_time
   #    flipped_xs[t] = f(xs[t])
   # end
   # return copy(flipped_xs)
   # but implemented via broadcasting as Zygote differentiates loops much slower than broadcasting
end

function (m::BLSTM)(Xs::T₃)::T₃ where T₃ <: DenseArray{<:Real,3}
   # preallocate output buffer
   Ys = Buffer(Xs, 2m.dim_out, size(Xs,2), size(Xs,3))
   axisYs₁ = axes(Ys, 1)
   time    = axes(Ys, 2)
   rev_time = reverse(time)
   @inbounds begin
      # get forward and backward slice indices
      slice_f = axisYs₁[1:m.dim_out]
      slice_b = axisYs₁[(m.dim_out+1):end]
      # bidirectional run step
      setindex!.(Ref(Ys),  m.forward.(view.(Ref(Xs), :, time, :)), Ref(slice_f), time, :)
      setindex!.(Ref(Ys), m.backward.(view.(Ref(Xs), :, rev_time, :)), Ref(slice_b), rev_time, :)
      # the same as
      # @views for (t_f, t_b) ∈ zip(time, rev_time)
      #    Ys[slice_f, t_f, :] =  m.forward(Xs[:, t_f, :])
      #    Ys[slice_b, t_b, :] = m.backward(Xs[:, t_b, :])
      # end
      # but implemented via broadcasting as Zygote differentiates loops much slower than broadcasting
   end
   return copy(Ys)
end

function (m::PBLSTM)(xs::VM)::VM where VM <: DenseVector
   # reduce time duration by half by restacking consecutive pairs of input along the time dimension
   xxs = vcat.(getindex.(Ref(xs), 1:2:lastindex(xs)), getindex.(Ref(xs), 2:2:lastindex(xs)))
   # the same as
   # @views xxs = @inbounds(vcat.(xs[1:2:end], xs[2:2:end]))
   # but implemented via broadcasting for performance reasons
   return vcat.(m.forward.(xxs), flip(m.backward, xxs))
end

Hs_buffer = Buffer(h, size(h,1), batch_size, length(hs))
setindex!.(Ref(Hs_buffer), hs, :, :, axes(Hs_buffer, 3))
# the same as
# for k ∈ eachindex(hs)
#    Hs_buffer[:,:,k] = hs[k]
# end
# but implemented via broadcasting as Zygote differentiates loops much slower than broadcasting
Hs = copy(Hs_buffer)

timholy commented 4 years ago

I haven't spent more than 2 seconds thinking about this, but isn't fixing Zygote the best approach?

mbauman commented 4 years ago

It's not just Zygote that depends upon this — GPU vectorization also heavily depends upon broadcasting.

timholy commented 4 years ago

Is it mostly a missing hoisting optimization? Why doesn't LICM do this for us?

KristofferC commented 4 years ago

I think it is just easier to overload and know what a broadcast expression will do than to try pattern match a loop. A loop is a pretty low level abstraction compared to broadcasting.

timholy commented 4 years ago

Fair enough. But it would be giving up a lot to resign to not having fast loops anymore.

I don't doubt it would be hard, but I wonder if having our own LICM pass that runs before inlining is something we'll eventually decide we need. LLVM's pattern matcher might be specialized for what it thinks of as a number, and Julia's notion of Number is rather more expansive. Presumably much of the performance gap comes from repeated dual-number operations on hoistable quantitites.

rapus95 commented 4 years ago

I instantly fell in love with the suggestion of making it explicit by @andyferris (https://github.com/JuliaLang/julia/issues/30845#issuecomment-475865027). The only missing thing would be an ascii variant which is easy enough to type and remember that no one will complain about dropping APL style at some time. I mostly like that approach out of the same reason why I like broadcasting, CartesianIndex/Indices and the whole promotion/conversion mechanism. Make the behaviour more explicit while increasing elegance. Have code working by default due to clever mechanics. But I also support @timholy (https://github.com/JuliaLang/julia/issues/30845#issuecomment-458370787) and even go further by saying that we should think about restructuring the whole indexing machinery for further extensibility and clarity. (And thus allowing to test new indexing schemes)

Proposal

I suggest to remove much of the index handling from getindex/setindex! into some sort of IndexBuilder and then call arrays with a built index. Currently we already have the to_indices functionality (which is definitively missing a reference in the Interfaces/Indexing section) which handles boolean arrays. So we might be able to generalize that approach if we turned indexing into another trait- or inheritance based construct. My aim would be to only have to define 1-arg getindex functions with explicit scalar indexing types and to move index building and transforming to a new type. That hopefully allows for orthogonalization of indexing type/style, indexing arguments and indexed array and also might be able to lift some of the transforms to compile time. It on the other hand can lead to more lazy evaluations and to easily unify/extract the behaviour of indexing on LHS and RHS.

new array type

Define #1 or #2 based on IndexStyle

getindex(A::AbstractArray, ::LinearIndex) #1 if IndexStyle==IndexLinear
getindex(A::AbstractArray, ::CartesianIndex) #2 if IndexStyle==IndexCartesian

getindex(A::AbstractArray, ii::IterableIndex) = collect(idx->A[idx], ii) #3 default
getindex(A::AbstractArray, specialized::IndexBuilder) = getindex(A, build(IndexStyle(A), A, specialized))) #4 default
getindex(A, args...) = getindex(A, DefaultBuilder(A)(args...)) #redirecting fallback or via lowering
DefaultBuilder(A::AbstractArray) = tuple #current default/fallback

You may define #3 if you want to shortcut some paths for performance. #4 usually shouldn't be redefined. If #3 was solved via broadcasting, it would even result in a fused loop after inlining!

new builder type (like APL, broadcasting, compiletime, whatsoever)

Define #5 or #6 based on granularity and the ability to order collections for performance

build(A::AbstractArray, ib::IndexBuilder)::IndexType #5
build(s::IndexStyle, A::AbstractArray, ib::IndexBuilder)::IndexType(s) = convert(s, build(A, ib)) #6 default

build(A::AbstractArray, t::Tuple) = to_indices(A, t) #current default/fallback

comparions to current situation

Those, who know the fallback behaviour of getindex, know, that we currently have almost exactly that behaviour.

comparison to suggested behaviours

Have I ⊗ J return APL(I,J) to get current behaviour. Have A[APL(1:10, 1:10)] to mirror the current A[1:10, 1:10] Have A[BC(1:10, 1:10)] to mirror the suggested A.[1:10, 1:10] (compare the dot after A)

All those rewrites of the [] could be managed by macros to allow for smoother writing.

tkf commented 4 years ago

FYI, there were some discussions around awkward-array https://github.com/scikit-hep/awkward-array in https://github.com/andyferris/Dictionaries.jl/issues/19#issuecomment-641524140 with me and @andyferris. It is a Python library for nested arrays and tables. If you are curious, I think this Strange Loop talk "Jagged, ragged, awkward arrays" by Jim Pivarski - YouTube is a great introduction to it. In particular, it also touches on the growing need and attention of such data structure in technical computing. I wonder if nonscalar indexing syntax A.[I] can be used as a building block for this. I'd imagine something like x.[i].[j].[k] can be used for dealing with nested structure.

@andyferris Re https://github.com/JuliaLang/julia/issues/36105#issuecomment-645039761

For nested things @tkf I wonder if we could using nested dot-getindices operations to detangle it on the LHS, as it seems (to me) you might need to do anyway on the RHS?
A.[2:4].[end] .= B.[1:3].[begin]
(I'm not sure if that particular example makes sense, but something like that)

I'd imagine we need to call a strictly lazy version of broadcastable on the LHS (e.g., don't call collect in it) and build the Broadcasted. This way, in-place mutation on the leaves of the LHS would be reflected in A. So, we'd have the trees of Broadcasted for both LHS and RHS. Is this what you mean by detangling?

andyferris commented 4 years ago

Is this what you mean

Yes, I think that sounds right.

JuliaLang / julia