JuliaArrays / ArraysOfArrays.jl

Efficient storage and handling of nested arrays in Julia
Other
43 stars 9 forks source link

getindex(.) return type does not match eltype #2

Open colinxs opened 5 years ago

colinxs commented 5 years ago

First off, awesome idea/package! It's been quite useful in my own work.

I was really excited to use your package in conjunction with StructArrays but ran into an issue (that I attempted to fix with a PR to StructArrays) as explained here. The owner of that package pointed out, however, that the real issue is the following:

julia> x=nestedview(rand(2,3,10), 2);
julia> x[1] isa eltype(x)
false

What are your thoughts on modifying eltype to be consistent with the true return type of getindex?

-Colin

oschulz commented 5 years ago

Thanks!

Yes, the getindex/eltype inconsistency is annoying, esp. when using it with StructArray or TypedTable.

I definitely want to solve that, but it's a bit tricky - just using the underlying array type as eltype is not enough, because we get into trouble (e.g. with broadcast) if it's a view.

oschulz commented 5 years ago

There's also this problem: If we just make eltype returns the right thing, then there would be a mismatch between eltype and the element type deduced from T in foo(A::AbstractArray{T) on an ArraysOfArrays array. I think a lot of code is written based on the assumtion that T and eltype(A) match.

piever commented 5 years ago

Just to pitch in (as this was brought up in https://github.com/piever/StructArrays.jl/pull/84), I think that changing the eltype should be done not by adding a custom eltype method but by giving the correct type parameter to AbstractArray, i.e. ArrayOfArray<:AbstractArray{SubArray...}. Implementation-wise, this should be an extra type parameter of ArrayOfArray that gets computed in an internal constructor.

oschulz commented 5 years ago

I fully agree, @piever. That's what makes it a bit more tricky - but I have an idea (like you said, via constructor), I'll try it out.

oschulz commented 5 years ago

@piever, @colinxs, haven't forgotten about this - holiday season, but it's definitely on my short-term ToDo list.

oschulz commented 5 years ago

Update: Haven't forgotten about this, just had to be put on hold for some other urgent work. Will get back on this soon.

oschulz commented 4 years ago

Sorry this is taking so long - it's not forgotten and definitely still on my to-do list!

colinxs commented 4 years ago

@oschulz no worries, I actually figured this out and have been using it in my own codebases:

# For AbstractArray A and indices I, compute V = typeof(view(A, I...)) by:
# 1) Try to infer V from typeof(A) and typeof(I) by calling viewtype(typeof(A), typeof(I))
# 2) If !isconcretetype(V), fall back to typeof(view(A, I...))
# Custom subtypes of AbstractArray (e.g. UnsafeArray) could provide customized implementations
# of viewtype(A::AbstractArray, I::Tuple) or viewtype(::Type{<:AbstractArray}, ::Type{<:Tuple})
# if required.

function viewtype(A::AbstractArray, I::Tuple)
    T = viewtype(typeof(A), typeof(I))
    _viewtype(A, I, T, Val(isconcretetype(T)))
end
viewtype(A::AbstractArray, I...) = viewtype(A, I)

function viewtype(::Type{A}, ::Type{I}) where {A<:AbstractArray,I<:Tuple}
    Core.Compiler.return_type(view, _view_signature(A, I))
end
Base.@pure _view_signature(::Type{A}, ::Type{I}) where {A,I<:Tuple} = Tuple{A, I.parameters...}

_viewtype(A, I, T, ::Val{true}) = T
function _viewtype(A, I, T, ::Val{false})
    try
        return typeof(view(A, I...))
    catch e
        Istring = join(map(string, I), ", ")
        printstyled(stderr,
        """
        Unable to infer typeof(view($(typeof(A)), $(join(map(i -> string(typeof(i)), I), ", "))))
        Only other option is to try typeof(view(A, $Istring)) but that resulted in below error.
        Try passing in valid indices or implement:
            viewtype(::Type{$(typeof(A))}, ::Type{$(typeof(I))})
        """, color=:light_red)
        rethrow(e)
    end
end

I've got an in-works implementation something similar to ArrayOfSimilarArrays that I'll make a PR for.

Thanks again for keeping on top of this! :)

colinxs commented 4 years ago

Also, two other options for _viewtype(A, I, T, ::Val{false}) that I considered were:

_viewtype(A, I, T, ::Val{false}) = @inbounds view(A, I...)
@propagate_inbounds _viewtype(A, I, T, ::Val{false}) = view(A, I...)

The first has the benefit of working for empty arrays, but that seems unsafe. The second option I rejected because that would require e.g. @inbounds nestedview(A_flat, 2), which seemed weird to do when calling a constructor.

oschulz commented 4 years ago

Thanks, I really got to add this soon!

colinxs commented 4 years ago

As an alternate idea, you could use the SlicedArray that I made here.

I ended up needing the ability to slice along arbitrary dimensions instead just the inner M dimensions and ran across JuliennedArrays.jl. I took the best parts of that and ArraysOfArrays to make SlicedArray which I use internally for Lyceum. An ArrayOfArrays{T,M} could be implemented as:

function sliceinner(A::AbstractArray{<:Any,L}, ::Val{M}) where {L,M}
  slice(A, ntuple(_ -> Colon(), Val(M))..., ntuple(_ -> *, Val(L-M))...)
end

You could also use JuliennedArrays directly, SlicedArray is the same idea just with more overloads for Base functions, comprehensive tests, and better performance. I'd like to upstream those changes in the future but haven't had time yet. If this sounds useful I could register SpecialArrays (currently it's only registered in the Lyceum registry).

As an aside, you might find some of the test utilities I wrote useful, given that you have a lot of packages providing various custom arrays. That test suite is what originally prompted the changes to ElasticArrays I made in https://github.com/JuliaArrays/ElasticArrays.jl/pull/20. Perhaps factoring that out to something like ArrayTestTools.jl would be useful to JuliaArrays/others?

oschulz commented 4 years ago

I could register SpecialArrays

Hm, since ArraysOfArrays is already registered, and since there would be conflicts with some exported functions (like innersize), I guess I would prefer to backport your improvements to ArrayOfSimilarArrays. If the package has to evolve in the process (incl. some API changes), that would be fine. I've actually been planning to move it under the "JuliaArrays" organization to make it a bit more "official" - talked to Tim Holy about it a JuliaCon, just haven't gotten 'round to it yet.

Perhaps factoring that out to something like ArrayTestTools.jl would be useful to JuliaArrays/others?

Yes, that might indeed be very useful!

colinxs commented 4 years ago

Makes sense to backport the changes ArraysOfArrays.jl! Given that SlicedArray (and also FlattenedArray, which is a flattened view of a nested array) were inspired largely by JuliennedArrays.jl it'd be good to check with its dev (@bramtayl) first? The way forward could then be to upstream SlicedArray/FlattenedArray to JuliennedArrays.jl and then have ArraysOfArrays.jl depend on that? The alternative would be to just put everything in ArraysOfArrays.jl directly.

In the meantime, I'll work on factoring out ArrayTestTools.jl.

bramtayl commented 4 years ago

It makes me so happy when I find out people are reading my packages! Happy to accept PRs to JuliennedArrays

oschulz commented 4 years ago

@bramtayl, I just came across JuliennedArrays a while ago - from what I saw, it's quite similar to ArrayOfSimilarArrays, but trades increased flexibility (slicing in any dimension) for some performance?

Would you possibly be interested in moving it into ArraysOfArrays, or would you prefer to keep it as a lightweight separate package?

oschulz commented 4 years ago

@colinxs having a FlattenedArray in ArraysOfArrays would be great - I've been meaning to add something like that, to replace the current generic implementation of flatview

@inline flatview(A::AbstractArray{<:AbstractArray}) = Base.Iterators.flatten(A)

with something that actually returns an AbstractArray-subtype. One of the things that kept sliding to the bottom of my to-do list ... :-)

bramtayl commented 4 years ago

@oschultz I'd be thrilled to move it into ArraysOfArrays so I don't have to maintain the code anymore. I'd be curious to know more about performance differences

colinxs commented 4 years ago

hahah and I love when devs are super responsive and eager to accept PRs :).

@bramtayl, when you have a moment want to look over SlicedArray and FlattenedArray. The major changes are:

Slices

I also significantly re-wrote the internals for better inference/performance and added full test coverage.

colinxs commented 4 years ago

Concerning the performance difference: there isn't any. All of the fancy indexing that allows slicing along arbitrary dimensions effectively compiles out.

bramtayl commented 4 years ago

@colinxs those all sound like improvements to me. Earlier versions of JuliennedArrays used * and : to specify dimensions (I think that was a Tim Holy suggestion). I changed to True and False because I got excited about the wider potential of Typed Bools (hasn't panned out yet). I'm not sure I understand about PermutedDimsArray.

oschulz commented 4 years ago

I'd be thrilled to move it into ArraysOfArrays so I don't have to maintain the code anymore.

Haha, and here I thought we'd maintain it together. ;-)

I'd be curious to know more about performance differences

Good point, we should definitely do some benchmarking!

oschulz commented 4 years ago

Regarding PermutedDimsArray - when I wrote ArrayOfSimilarArrays I actually thought that applications that needed to slice along another axis could us an ArrayOfSimilarArrays wrapped around a PermutedDimsArray. Would be interesting to see if that could achieve similar performance to JuliennedArrays.Slices. If so, maybe we could reduce code there quite a bit and make things easier to maintain if we join it all in ArraysOfArrays.

oschulz commented 4 years ago

Adds adjoints for Zygote.jl

Oh, that would be nice to have here (ArraysOfArrays) too!

colinxs commented 4 years ago

@bramtayl PermutedDimsArray just remaps dimensions e.g.:

julia> A=rand(2,3)
2×3 Array{Float64,2}:
 0.307533  0.822543  0.530326
 0.591914  0.782564  0.948809

julia> B=PermutedDimsArray(A, (2,1))
3×2 PermutedDimsArray(::Array{Float64,2}, (2, 1)) with eltype Float64:
 0.307533  0.591914
 0.822543  0.782564
 0.530326  0.948809

julia> A[1,2] == B[2,1]
true

So if you had A = [rand(2,3) for i=1:4], Align(A, True(), True(), False()) and PermutedDimsArray(Align(A, False(), True(), True()), (2, 3, 1)) would be equal.

bramtayl commented 4 years ago

Oh I see. Yes that makes sense to me. I'm not sure if there would be a performance penalty for PermutedDimsArray?

colinxs commented 4 years ago

@oschulz

I thought about doing the same thing, but you miss out on some optimizations with the extra layer of indirection. If we had:

A = rand(2,3,4)
B = slice(A, :, *, *)

Then doing B[:, 1] as implemented is basically a no-op because we can just forward the Colon to the parent array and "re-slice". See this comment and the subsequent getindex definition for an example.

If you rely on PermutedDimsArray, then you'd be falling back to Cartesian indexing and collecting the output in a new array.

colinxs commented 4 years ago

@bramtayl

SlicedArray still uses True/False internally, but I figured using :/* was a friendlier user-facing API.

oschulz commented 4 years ago

I like : / *!

But in your example above - slice(A, :, *, *) could already be done with just an ArrayOfSimilarArrays, without PermutedDimsArray, right? Or do if mix : and * up, here`?

oschulz commented 4 years ago

In any case, if a more generic version of ArrayOfSimilarArrays is possible without loss of performance, we could add it to ArraysOfArrays and reduce ArrayOfSimilarArrays to a type alias (possibly with an optimized implementation for some functions, if necessary).

colinxs commented 4 years ago

@oschulz exactly, but ArrayOfSimilarArrays is just a special-case of a SlicedArray or @bramtayl's Slices.

Basically, if we incorporate SlicedArray/Slices into ArraysOfArrays.jl then it would entirely replace ArrayOfSimilarArrays.

@oschulz @bramtayl given the above improvements I made would you all be comfortable incorporating FlattenedArray/SlicedArray instead of Align/Slices?

oschulz commented 4 years ago

Haha, we typed that at the same time.

oschulz commented 4 years ago

And I do like the names FlattenedArray and SlicedArray - have you checked for conflicts with other packages already?

colinxs commented 4 years ago

yup! Also, might you be available for a phone call? It could make planning this out a bit easier haha.

I'm hoping to register all of github.com/Lyceum in General soon, which requires FlattenedArray/SlicedArray being part of a registered package.

colinxs commented 4 years ago

I haven't checked for name conflicts. I'll do a google search now. Is there a good way to grep all Julia packages? lol

Google doesn't appear to turn anything up: "SlicedArray" + "julia" "FlattenedArray" + "julia".

Just searching for "SlicedArray" and "FlattenedArray" turned up a couple Python/Java references.

oschulz commented 4 years ago

@c42f: @colinxs, @bramtayl and me are thinking about combining our several "array of arrays" implementations in ArraysOfArrays.jl. Could we move ArraysOfArrays into the JuliaArrays org?

oschulz commented 4 years ago

Is there a good way to grep all Julia packages?

I usually ripgrep through my packages dir - since I have some huge "Project.toml"s active, that tends to cover any major (and lot's of minor) packages. :-)

I think "SlicedArray" and "FlattenedArray" are indeed free - let's grab' em officially!

oschulz commented 4 years ago

yup! Also, might you be available for a phone call?

Yes, that would speed things up - I'll send you an email.

oschulz commented 4 years ago

I emailed you too, @bramtayl .

c42f commented 4 years ago

Could we move ArraysOfArrays into the JuliaArrays org?

Certainly! If you initiate the transfer I can accept it and I'll make you maintainer @oschultz. Then you should be able to add collaborators as you desire. (That seems simplest until/unless we fix the JuliaArrays team setup which seems strangely configured at the moment.)

oschulz commented 4 years ago

@c42f while we're at it, I'd transfer UnsafeArrays.jl as well.

oschulz commented 4 years ago

@c42f transfer done - looks like I had sufficient rights in JuliaArrays to do the transfer on my own. :-)

c42f commented 4 years ago

Ok, cool. Let me know if there's any setup which you actually need my help for!

oschulz commented 4 years ago

@c42f , all fine, looks like - but could you return admin rights to ArraysOfArrays and UnsafeArrays to me? Those got lost in the move to JuliaArrays.

colinxs commented 4 years ago

Thanks again for meeting! Here's a summary of what we spoke about:

  1. PR1 will include FlattenedArray + the test utils. This will be non-breaking and we can go ahead and tag + release after merging.
    • I'll take a look at how to handle flatview(::SlicedArray). Current options include:
    • Handle the permutation internally like @bramtayl does
    • Return the parent when possible*, otherwise a PermutedDimsArray
    • Return the parent when possible*, otherwise a FlattenedArray(::SlicedArray)
    • @oschulz I think once this gets merged it'd be good to rewrite the tests for the other array types in ArraysOfArrays.jl in terms of these tests (where possible).
    • I'll likely move the test utils to its own package after everything gets merged.
  2. PR2 will include SlicedArray, which will replace ArrayOfSimilarArrays, + some (possible) changes to your inner_* functions. This will be a breaking PR.
  3. @bramtayl if you could send that mapreduce benchmark you mentioned that'd be great!

* Should we avoid having variable return types? e.g. flatview(slice(rand(2,3,4), args...)) could either return an Array (the parent) or one of the above options. This also comes up in getindex(::SlicedArray, I...):

julia> S=slice(rand(2,3,4), :, *, *)
3×4 slice(::Array{Float64,3}, :, *, *) with 2-element eltype SubArray{Float64,1,Array{Float64,3},Tuple{Base.Slice{Base.OneTo{Int64}},Int64,Int64},true}:
 [0.242411, 0.723567]  [0.313344, 0.705964]  [0.45211, 0.308781]   [0.637732, 0.0295599]
 [0.806773, 0.477879]  [0.581994, 0.515911]  [0.98953, 0.462396]   [0.980186, 0.498727]
 [0.038467, 0.479799]  [0.284736, 0.340725]  [0.963542, 0.926671]  [0.233843, 0.141494]

julia> S[:,1]
3-element slice(view(::Array{Float64,3}, :, :, 1), :, *) with 2-element eltype SubArray{Float64,1,Array{Float64,3},Tuple{Base.Slice{Base.OneTo{Int64}},Int64,Int64},true}:
 [0.2424109148738105, 0.7235671322839587]
 [0.8067728866162733, 0.47787858099260583]
 [0.038466976626140514, 0.479799247981757]

julia> S[:,1,1]
3-element Array{SubArray{Float64,1,Array{Float64,3},Tuple{Base.Slice{Base.OneTo{Int64}},Int64,Int64},true},1}:
 [0.2424109148738105, 0.7235671322839587]
 [0.8067728866162733, 0.47787858099260583]
 [0.038466976626140514, 0.479799247981757]

In that example, the optimization for S[:,1] is fairly significant, so we probably want it:

julia> @btime $S[:,1];
  12.085 ns (2 allocations: 80 bytes)

julia> @btime $S[:,1,1];
  83.432 ns (5 allocations: 368 bytes)

The best example I can think of that suggests it's okay to have variable return types is reshape(A::AbstractArray{T,N}, ::Val{M}). If M==N you get A, otherwise Base.ReshapedArray.

bramtayl commented 4 years ago

For 3):

using JuliennedArrays
using BenchmarkTools

x = rand(1000, 1000)
@btime map(sum, Slices(x, False(), True()))
@btime sum(x, dims = 2)
oschulz commented 4 years ago

Thanks for writing the summary!

oschulz commented 4 years ago

Should we avoid having variable return types? e.g. flatview(slice(rand(2,3,4), args...))

I would strongly favor using the optimal return type for each case, as long as it can be done in a type-stable way. So if appropriate, flatview should just return the parent array.

c42f commented 4 years ago

could you return admin rights to ArraysOfArrays and UnsafeArrays to me

Of course, done. I think you should now be able to add collaborators and manage all the things.

oschulz commented 4 years ago

Thanks, @c42f ! @colinxs, you can write to ArraysOfArrays now - probably best to still make a PR for the new features though, will make it easier to discuss. Travis/Documenter for ArraysOfArrays is configured to generate doc previews for PRs, btw.

colinxs commented 4 years ago

Haven't gotten around to making the PRs yet as I'm cleaning things up in SpecialArrays.jl first, but I wanted to give an update on the replacement for JuliennedArrays.Align. The behaviour of Align can be achieved with FlattenedArray + PermutedDimsArray and benchmarking shows very little overhead (1-2%). Given the simplicity of that approach (and some of the tricks that PermutedDimsArray does) I'm going to move forward with this approach.