JuliaArrays / OffsetArrays.jl

Fortran-like arrays with arbitrary, zero or negative starting indices.
Other
195 stars 40 forks source link

getindex on an OffsetArray behaves unintuitively #322

Closed mkborregaard closed 1 year ago

mkborregaard commented 1 year ago

Consider

julia> a = OffsetArray(2:5, -3)
2:5 with indices -2:1

julia> eachindex(a)[2:end]
2:1

julia> collect(eachindex(a))[2:end]
3-element Vector{Int64}:
 -1
  0
  1

I would not expect collecting the object to change the output from indexing into the object. The latter behaviour is what I intuitively assumed for the non-collected object.

jishnub commented 1 year ago

This is expected. eachindex returns an axis with offset indices. collect converts this to a vector, stripping the axis offset.

julia> eachindex(a) |> axes
(OffsetArrays.IdOffsetRange(values=-2:1, indices=-2:1),)

julia> collect(eachindex(a)) |> axes
(Base.OneTo(4),)

Note that the docstring for collect states that

collect(element_type, collection)

  Return an Array with the given element type of all items in a collection or iterable.

so the returned result is necessarily an Array and has 1-based indices, irrespective of the indices of the argument. Had collect converted the argument to an offset vector, the results would have been identical. For example:

julia> OffsetArray(collect(eachindex(a)), axes(eachindex(a)))[2:end]
Int64[]
mkborregaard commented 1 year ago

Even though this might be expected by an author of the package, it is highly unintuitive, and as such I'd argue this is not resolved. Why should indexing into the result of eachindex(a) retain the same offset as indexing into a?

jishnub commented 1 year ago

The way vector indexing works in Julia in general is that a[b] has the same indices as b. If a and eachindex(a) have the same axes, then a[b] and eachindex(a)[b] will also have the same axes. This brings us to the bigger question: why do a and eachindex(a) have the same axes? This is because, for consistency across the ecosystem, a convention has been chosen that the axes of arrays are their own axes. If this sounds confusing, we may look at a few examples:

For a Vector:

julia> a = ones(4);

julia> ax = eachindex(a)
Base.OneTo(4)

julia> eachindex(ax)
Base.OneTo(4)

So the array and its axis have the same axes. In other words, the axis is its own axis.

For an OffsetArray, we obtain something similar:

julia> a = ones(4:6);

julia> ax = eachindex(a)
OffsetArrays.IdOffsetRange(values=4:6, indices=4:6)

julia> eachindex(ax)
OffsetArrays.IdOffsetRange(values=4:6, indices=4:6)

The way the axis of an OffsetArray is displayed makes this clear, that the values of the axis and the indices of the axis are identical. This means that, again, the axis is its own axis.

This leads to desirable properties in indexing, such as, firstly, ax[i] == i for all i in ax, and secondly,

julia> a = OffsetArray(1:4, 3);

julia> ax = eachindex(a);

julia> a[ax[begin]] == a[ax][begin]
true

which makes vector indexing consistent with a scalar one, and makes it much easier to reason about offset indexing. If you want to understand about offset axes in detail, I would recommend reading the documentation of the internals of the package.

TLDR: this convention exists for consistency and convenience. In general, for all 1D vectors in Julia, we expect the axes to have the same indices as the array, offset or otherwise.

mbauman commented 1 year ago

Even more concretely, Julia's indexing behaviors have the property that the axes of the indexes are the axes of the result. That is, axes(A[I...]) should be the splatted concatenation of map(axes, I). Making the axes of an offset array similarly offset means that A == A[axes(A)...] == A[LinearIndices(A)] == A[CartesianIndices(A)]. Cf. https://github.com/JuliaLang/julia/pull/27038.

mkborregaard commented 1 year ago

OK, thanks a lot @mbauman and @jishnub for explaining.

I have to admit I'm still not convinced that this is not a big footgun for end users, as I would have thought having eachindex be easily indexable using conventional 1-based indexing was a really easy way for e.g. package authors to internally abstract away that some users might hand you offsetarrays.

But, at least I am convinced that the behaviour is what is best for people working with internal structures of Julia, so it probably is the best design. It's just unfortunate that this sometimes conflicts with what is intuitive.

jishnub commented 1 year ago

I understand that it might seem unintuitive, but the property A[i] == A[ax[i]] == A[ax][i] is quite desirable, and this requires the axis to have the same indices as the array. A way to work around the axis offset is to convert the axis to a UnitRange:

julia> UnitRange(eachindex(a))[2:end]
-1:1

julia> collect(eachindex(a))[2:end]
3-element Vector{Int64}:
 -1
  0
  1

Alternately, use begin everywhere instead of 1, as

julia> eachindex(a)[begin + 1:end]
-1:1