Efficient getindex implementations for subsetting AcceleratedArrays

kcajf commented 5 years ago

Hi @andyferris - this project looks great, and is something I am considering extending / building on. I already have some hacky implementations of something similar, but not as nicely integrate and as general.

What are you thoughts on adding a family of getindex(A::AcceleratedVector, idx::AbstractVector{Int}) methods? I.e. indexing into AcceleratedVectors with integer StepRanges, integer Vectors, etc, and returning AcceleratedVectors of the same type.

For SortIndex, this would be quite easy. When indexing with StepRanges, we would have to check the step is positive. For indexing with arrays, we would have to check the array is sorted first. Since we would be constructing SortIndexes from known-sorted arrays, it would be good to have a SortIndex constructor that doesn't do any checks. This might also be useful generally.

For HashIndex, the hash table would have to be modified, but there are likely lots of optimisations / shortcuts to minimise the work. For example, when indexing with UnitRange{Int}s, we could add to a global integer offset that is subtracted from the values in the Dict when they are accessed. Similar to the above, we might want a more direct HashIndex constructor that accepts a pre-built dictionary & offset, etc.

I'd be happy to do some initial work on this after hearing your thoughts.

andyferris commented 5 years ago

Yes, that sounds great - the other one I’be thought of is the way indexing a sorted array with a sorted array is also sorted (and we also know that ranges are naturally sorted...). So indexing can basically “preserve” SortIndex (as well as uniqueness).

When I think of hashes - I think it might be simplest to just drop them (except for maybe uniqueness?). I’m trying to think of compelling end-use cases.

andyferris commented 5 years ago

(To say it differently - indexing is definitely in the “roadmap” as it is in my head but it’s been a matter of finding time for me to contribute to this project - and help is always appreciated!)

kcajf commented 5 years ago

When I think of hashes - I think it might be simplest to just drop them (except for maybe uniqueness?). I’m trying to think of compelling end-use cases.

What do you mean by this?

andyferris commented 5 years ago

I just mean - it seems quite complicated when indexing to preserve the hash table for some (possibly repeated?) subset. I also think it could be slow - if you want to shrink the size of the hash table you need to recompute hashes etc (slow for large selections) and for small selections the overhead of the hash table is large while not providing any acceleration benefits. It seems it might be best to leave the decision in the hand of the user, who can always accelerate the result when it makes sense for their use case.

ivirshup commented 4 years ago

Would it be reasonable to encode propagation through getindex in it's type? Something like HashIndex{Propagating}?

My use-case is using an accelerated array for the dimensions of a DimensionalArray (from DimensionalData). I'd like to be able to subset a DimensionalArray, and have the result still have accelerated indices. I currently don't see a way to enforce this without some type-piracy.

andyferris / AcceleratedArrays.jl

Efficient getindex implementations for subsetting AcceleratedArrays #4