Closed Luapulu closed 3 years ago
This seems like an appropriate usage to me. I've got some deadlines so may not get to you quickly, but I don't think you're off-base to be thinking this way.
I've started working on some of these ideas in TiledArrays.jl.
I may have misunderstood the issue here. After https://github.com/JuliaArrays/TiledIteration.jl/pull/23 (v0.3.0), TileIteration
has changed to array interface, e.g.,:
julia> A = rand(200, 50);
julia> TileIterator(axes(A), (128,8))
2×7 TileIterator{2,Tuple{TiledIteration.CoveredRange{StepRange{Int64,Int64},TiledIteration.LengthAtMost},TiledIteration.CoveredRange{StepRange{Int64,Int64},TiledIteration.LengthAtMost}}}:
(1:128, 1:8) (1:128, 9:16) (1:128, 17:24) (1:128, 25:32) (1:128, 33:40) (1:128, 41:48) (1:128, 49:50)
(129:200, 1:8) (129:200, 9:16) (129:200, 17:24) (129:200, 25:32) (129:200, 33:40) (129:200, 41:48) (129:200, 49:50)
The TiledIterator
is an array to help index into another. With a ‘TiledArray’ I mean an Array that contains the actual data, that can, or even must be, indexed into tile by tile — An easy example being chunked data on disk.
The problem is, adding a new kind of index currently requires rewriting a whole lot of code. Reduce operations, broadcasting, views, iteration, sorting, reshaping, etc. have to be rewritten algorithmically, but you also have to implement a whole system of conversions between different indices, i.e. cartesian to tile, tile to cartesian, tile of of one tiling style to tile of another tiling style and this has to work for any kind of regular tiling.
This is why I thought it might be possible to hijack Base.to_indices
. You’d still have to rewrite the algorithmic aspects for the new index type, but at least you wouldn’t also have to reimplement conversion between indices everywhere. Just define to_indices
once and you’d be done. This also allows indexing into already existing array types with a new index.
The idea then is to also make to_indices dispatch on IndexStyle, so as to allow different arrays to have different index styles, even if they are subtypes of each other, or otherwise related in the type tree.
More importantly, sort, reduce, broadcast, etc. also have to dispatch on IndexStyle, for all of this to work.
So if I understand it correctly, if At
is a TiledArray
, then At[1]
returns an array here? Does it bring any actual performance improvement compared to A[TileItertor(...)[1]]
where A
is the normal array?
Or you just want to store the tile metadata to normal array A
, and use it to create R = TileIterator(...)
so that indexing A[R[1]]
is efficient? If this is the case, then a very thin array wrapper can be made for this purpose and there's no need to create a new IndexStyle.
I'm a little skeptical to mix the normal array world and the fancy block array world.
No, At[1]
would return the first element of At
, not an array. At[TileIndex(1)]
would return the first tile (an array in most cases) and something like At[TileIndex(1), 1, 2, 3]
would return the element (1, 2, 3)
within the first tile. The user would rarely use TileIndex(1)
explicitly. Instead, operations like sum
or At .*= 3
would use TileIndex
implicitly.
At[TileIndex(1)] would return the first tile (an array in most cases) and something like At[TileIndex(1), 1, 2, 3] would return the element (1, 2, 3) within the first tile.
It's not very clear to me why TileIterator
doesn't fit the usage. Say R = TileIterator(...)
, then A[R[1]...]
return the first tile. It's not implemented yet, but it's doable to make TileIterator
with eltype CartesianIndices
so that A[R[1]]
works.
I still don't get why we need a new IndexStyle
for this usage.
One example where IndexStyle
come in useful: eachindex
dispatches on IndexStyle
to deliver an efficient iterator over all the indices in an array. For tiled arrays, that iterator should return indices tile by tile. So, something like:
eachindex(::IndexTiled, A) = ((ti, inds) for inds in eachindex(A[ti]) for ti in eachtileindex(A))
Just to clarify, I very much intend on using TileIterator
. The goal is not to replace, but to build on top of.
This is indeed different from how IndexStyle
has thus far been used — it really only says whether linear or cartesian indexing is fastest, and doesn't change the order of iteration. In fact, I wouldn't be surprised if there were generic code that assumes order of iteration of an eachindex
is columnar (or at least in sync with another array's eachindex
).
As a concrete counter point, we don't use a TransposedStyle
to change the iteration order for adjoints. Maaaaybe we could, but this gets into https://github.com/JuliaLang/julia/issues/15648 territory.
Just to clarify, I very much intend on using TileIterator. The goal is not to replace, but to build on top of.
This is exactly my worries. I'm worried that the changes to add a new IndexStyle
, which will be a big change, does not bring any actual improvement on both performance and API(IIUC).
~There's no API for eachindex(::IndexStyle, A...)
and~ there is no API for getindex(::IndexStyle, A, inds...)
, a bunch of such API changes is almost not worth doing if it only marginally improves the usage API. Not to mention that these changes would be very likely to break a lot of codes in downstream Julia packages.
It might not be public, but it's there https://github.com/JuliaLang/julia/blob/539f3ce943f59dec8aff3f2238b083f1b27f41e5/base/abstractarray.jl#L266
As far as performance, in the same way that remainder division makes it slow to go from linear index to cartesian index, it is slow to go from cartesian index to tile index. The reverse is faster, especially when you have a buffer that you're filling with values from a larger array. You want to fill the buffer once, do everything you need with it and then fill it with new values for the next chunk / tile / piece.
May be hijacking IndexStyle
is not the way to go, but there must (or ought to) be some way to specify how an array should be iterated over and indexed into?
I think you could still use an IndexTiled
style here... but the trick would be if you could get a cleverly fast iterator to return a TileIndex(tileid, i, j)
that iterated column major. Getting that iterator to be fast in column major order would be very tough though.
So, punt on the iteration order part and just tackle the multiplication being faster than remainder division part using IndexStyle?
If that's the case, it isn't much of a solution for something like DIskArrays.jl because iteration order is the key issue there. Unfortunately, that's also the main thing I want to work on :(.
It might help with stuff like virtual concatenation and more complex tiling though.
Just found BlockArrays.jl. That might solve my concrete issues at the moment.
I still wonder if an array interface that allows for different iteration order is possible. Not sure what role IndexStyle would have in such an interface.
ArrayInterface.jl seems to be dealing with some of these ideas. I’ll close this, since that’s where these sorts of discussions seem to be taking place.
That seems reasonable. There's also https://github.com/timholy/ArrayIteration.jl but the parts of it that haven't already moved to Base are mostly focused on the problem of sparsity.
I've been working on adding a
TiledIndex
with aIndexTiled <: IndexStyle
, for all sorts of arrays that are best indexed into via an index representing the tile and an index within that tile. See issue in TiledIteration.jl. Along the way I had a few thoughts:IndexStyle
, thus allowing users to implement new index types along with efficient algorithms for those index types.Base.to_index
andBase.to_indices
. Should these dispatch on theIndexStyle
, so that indices are converted to the correct type, based on theIndexStyle
. This would allow any user defined index to work with any array type as long as the relevantto_index
andto_indices
functions are implemented!May be I've misunderstood the point of
IndexStyle
. If that's the case, can anyone point me to some resources on implementing custom indexing? The docs only have a section for adding offset indexing, but not for more complicated indexing.