Open colinxs opened 5 years ago
Thanks!
Yes, the getindex
/eltype
inconsistency is annoying, esp. when using it with StructArray
or TypedTable
.
I definitely want to solve that, but it's a bit tricky - just using the underlying array type as eltype
is not enough, because we get into trouble (e.g. with broadcast
) if it's a view.
There's also this problem: If we just make eltype
returns the right thing, then there would be a mismatch between eltype
and the element type deduced from T
in foo(A::AbstractArray{T)
on an ArraysOfArrays
array. I think a lot of code is written based on the assumtion that T
and eltype(A)
match.
Just to pitch in (as this was brought up in https://github.com/piever/StructArrays.jl/pull/84), I think that changing the eltype should be done not by adding a custom eltype
method but by giving the correct type parameter to AbstractArray
, i.e. ArrayOfArray<:AbstractArray{SubArray...}
. Implementation-wise, this should be an extra type parameter of ArrayOfArray
that gets computed in an internal constructor.
I fully agree, @piever. That's what makes it a bit more tricky - but I have an idea (like you said, via constructor), I'll try it out.
@piever, @colinxs, haven't forgotten about this - holiday season, but it's definitely on my short-term ToDo list.
Update: Haven't forgotten about this, just had to be put on hold for some other urgent work. Will get back on this soon.
Sorry this is taking so long - it's not forgotten and definitely still on my to-do list!
@oschulz no worries, I actually figured this out and have been using it in my own codebases:
# For AbstractArray A and indices I, compute V = typeof(view(A, I...)) by:
# 1) Try to infer V from typeof(A) and typeof(I) by calling viewtype(typeof(A), typeof(I))
# 2) If !isconcretetype(V), fall back to typeof(view(A, I...))
# Custom subtypes of AbstractArray (e.g. UnsafeArray) could provide customized implementations
# of viewtype(A::AbstractArray, I::Tuple) or viewtype(::Type{<:AbstractArray}, ::Type{<:Tuple})
# if required.
function viewtype(A::AbstractArray, I::Tuple)
T = viewtype(typeof(A), typeof(I))
_viewtype(A, I, T, Val(isconcretetype(T)))
end
viewtype(A::AbstractArray, I...) = viewtype(A, I)
function viewtype(::Type{A}, ::Type{I}) where {A<:AbstractArray,I<:Tuple}
Core.Compiler.return_type(view, _view_signature(A, I))
end
Base.@pure _view_signature(::Type{A}, ::Type{I}) where {A,I<:Tuple} = Tuple{A, I.parameters...}
_viewtype(A, I, T, ::Val{true}) = T
function _viewtype(A, I, T, ::Val{false})
try
return typeof(view(A, I...))
catch e
Istring = join(map(string, I), ", ")
printstyled(stderr,
"""
Unable to infer typeof(view($(typeof(A)), $(join(map(i -> string(typeof(i)), I), ", "))))
Only other option is to try typeof(view(A, $Istring)) but that resulted in below error.
Try passing in valid indices or implement:
viewtype(::Type{$(typeof(A))}, ::Type{$(typeof(I))})
""", color=:light_red)
rethrow(e)
end
end
I've got an in-works implementation something similar to ArrayOfSimilarArrays
that I'll make a PR for.
Thanks again for keeping on top of this! :)
Also, two other options for _viewtype(A, I, T, ::Val{false})
that I considered were:
_viewtype(A, I, T, ::Val{false}) = @inbounds view(A, I...)
@propagate_inbounds _viewtype(A, I, T, ::Val{false}) = view(A, I...)
The first has the benefit of working for empty arrays, but that seems unsafe. The second option
I rejected because that would require e.g. @inbounds nestedview(A_flat, 2)
, which seemed weird to do when calling a constructor.
Thanks, I really got to add this soon!
As an alternate idea, you could use the SlicedArray
that I made here.
I ended up needing the ability to slice along arbitrary dimensions instead just the inner M
dimensions and ran across JuliennedArrays.jl. I took the best parts of that and ArraysOfArrays to make SlicedArray
which I use internally for Lyceum. An ArrayOfArrays{T,M}
could be implemented as:
function sliceinner(A::AbstractArray{<:Any,L}, ::Val{M}) where {L,M}
slice(A, ntuple(_ -> Colon(), Val(M))..., ntuple(_ -> *, Val(L-M))...)
end
You could also use JuliennedArrays directly, SlicedArray
is the same idea just with more overloads for Base
functions, comprehensive tests, and better performance. I'd like to upstream those changes in the future but haven't had time yet. If this sounds useful I could register SpecialArrays (currently it's only registered in the Lyceum registry).
As an aside, you might find some of the test utilities I wrote useful, given that you have a lot of packages providing various custom arrays. That test suite is what originally prompted the changes to ElasticArrays I made in https://github.com/JuliaArrays/ElasticArrays.jl/pull/20. Perhaps factoring that out to something like ArrayTestTools.jl
would be useful to JuliaArrays/others?
I could register SpecialArrays
Hm, since ArraysOfArrays is already registered, and since there would be conflicts with some exported functions (like innersize
), I guess I would prefer to backport your improvements to ArrayOfSimilarArrays
. If the package has to evolve in the process (incl. some API changes), that would be fine. I've actually been planning to move it under the "JuliaArrays" organization to make it a bit more "official" - talked to Tim Holy about it a JuliaCon, just haven't gotten 'round to it yet.
Perhaps factoring that out to something like ArrayTestTools.jl would be useful to JuliaArrays/others?
Yes, that might indeed be very useful!
Makes sense to backport the changes ArraysOfArrays.jl
! Given that SlicedArray
(and also FlattenedArray
, which is a flattened view of a nested array) were inspired largely by JuliennedArrays.jl it'd be good to check with its dev (@bramtayl) first? The way forward could then be to upstream SlicedArray
/FlattenedArray
to JuliennedArrays.jl and then have ArraysOfArrays.jl depend on that? The alternative would be to just put everything in ArraysOfArrays.jl directly.
In the meantime, I'll work on factoring out ArrayTestTools.jl.
It makes me so happy when I find out people are reading my packages! Happy to accept PRs to JuliennedArrays
@bramtayl, I just came across JuliennedArrays a while ago - from what I saw, it's quite similar to
ArrayOfSimilarArrays
, but trades increased flexibility (slicing in any dimension) for some performance?
Would you possibly be interested in moving it into ArraysOfArrays
, or would you prefer to keep it as a lightweight separate package?
@colinxs having a FlattenedArray
in ArraysOfArrays would be great - I've been meaning to add something like that, to replace the current generic implementation of flatview
@inline flatview(A::AbstractArray{<:AbstractArray}) = Base.Iterators.flatten(A)
with something that actually returns an AbstractArray
-subtype. One of the things that kept sliding to the bottom of my to-do list ... :-)
@oschultz I'd be thrilled to move it into ArraysOfArrays so I don't have to maintain the code anymore. I'd be curious to know more about performance differences
hahah and I love when devs are super responsive and eager to accept PRs :).
@bramtayl, when you have a moment want to look over SlicedArray and FlattenedArray. The major changes are:
Slices
SlicedArray
to match Base
's naming convention (SubArray
, PermutedDimsArray
)slice(A, dims::Tuple)
, slice(A, dims::Integer...)
, and slice(A, Union{Colon,typeof(*)})
as the user-facing API (similar to SubArray
and view
). slice(A, :, *, :)
would be equivalent to Slices(A, True(), False(), True())
, it's just syntactical sugar.Base
functions like resize!
, append!
etc.Base.mapslices
Align
FlattenedArray
flatview(A)
as the user-facing APIalongs
since you can get the same functionality by composing with PermutedDimsArray
I also significantly re-wrote the internals for better inference/performance and added full test coverage.
Concerning the performance difference: there isn't any. All of the fancy indexing that allows slicing along arbitrary dimensions effectively compiles out.
@colinxs those all sound like improvements to me. Earlier versions of JuliennedArrays used *
and :
to specify dimensions (I think that was a Tim Holy suggestion). I changed to True
and False
because I got excited about the wider potential of Typed Bools (hasn't panned out yet). I'm not sure I understand about PermutedDimsArray
.
I'd be thrilled to move it into ArraysOfArrays so I don't have to maintain the code anymore.
Haha, and here I thought we'd maintain it together. ;-)
I'd be curious to know more about performance differences
Good point, we should definitely do some benchmarking!
Regarding PermutedDimsArray
- when I wrote ArrayOfSimilarArrays
I actually thought that applications that needed to slice along another axis could us an ArrayOfSimilarArrays
wrapped around a PermutedDimsArray
. Would be interesting to see if that could achieve similar performance to JuliennedArrays.Slices
. If so, maybe we could reduce code there quite a bit and make things easier to maintain if we join it all in ArraysOfArrays
.
Adds adjoints for Zygote.jl
Oh, that would be nice to have here (ArraysOfArrays) too!
@bramtayl PermutedDimsArray
just remaps dimensions e.g.:
julia> A=rand(2,3)
2×3 Array{Float64,2}:
0.307533 0.822543 0.530326
0.591914 0.782564 0.948809
julia> B=PermutedDimsArray(A, (2,1))
3×2 PermutedDimsArray(::Array{Float64,2}, (2, 1)) with eltype Float64:
0.307533 0.591914
0.822543 0.782564
0.530326 0.948809
julia> A[1,2] == B[2,1]
true
So if you had A = [rand(2,3) for i=1:4]
, Align(A, True(), True(), False())
and PermutedDimsArray(Align(A, False(), True(), True()), (2, 3, 1))
would be equal.
Oh I see. Yes that makes sense to me. I'm not sure if there would be a performance penalty for PermutedDimsArray
?
@oschulz
I thought about doing the same thing, but you miss out on some optimizations with the extra layer of indirection. If we had:
A = rand(2,3,4)
B = slice(A, :, *, *)
Then doing B[:, 1]
as implemented is basically a no-op because we can just forward the Colon
to the parent array and "re-slice". See this comment and the subsequent getindex
definition for an example.
If you rely on PermutedDimsArray
, then you'd be falling back to Cartesian indexing and collecting the output in a new array.
@bramtayl
SlicedArray
still uses True
/False
internally, but I figured using :
/*
was a friendlier user-facing API.
I like : / *
!
But in your example above - slice(A, :, *, *)
could already be done with just an ArrayOfSimilarArrays
, without PermutedDimsArray
, right? Or do if mix :
and *
up, here`?
In any case, if a more generic version of ArrayOfSimilarArrays
is possible without loss of performance, we could add it to ArraysOfArrays and reduce ArrayOfSimilarArrays
to a type alias (possibly with an optimized implementation for some functions, if necessary).
@oschulz exactly, but ArrayOfSimilarArrays
is just a special-case of a SlicedArray
or @bramtayl's Slices
.
Basically, if we incorporate SlicedArray
/Slices
into ArraysOfArrays.jl then it would entirely replace ArrayOfSimilarArrays
.
@oschulz @bramtayl given the above improvements I made would you all be comfortable incorporating FlattenedArray
/SlicedArray
instead of Align
/Slices
?
Haha, we typed that at the same time.
And I do like the names FlattenedArray
and SlicedArray
- have you checked for conflicts with other packages already?
yup! Also, might you be available for a phone call? It could make planning this out a bit easier haha.
I'm hoping to register all of github.com/Lyceum in General
soon, which requires FlattenedArray
/SlicedArray
being part of a registered package.
I haven't checked for name conflicts. I'll do a google search now. Is there a good way to grep
all Julia packages? lol
Google doesn't appear to turn anything up: "SlicedArray" + "julia" "FlattenedArray" + "julia".
Just searching for "SlicedArray" and "FlattenedArray" turned up a couple Python/Java references.
@c42f: @colinxs, @bramtayl and me are thinking about combining our several "array of arrays" implementations in ArraysOfArrays.jl. Could we move ArraysOfArrays into the JuliaArrays org?
Is there a good way to grep all Julia packages?
I usually ripgrep through my packages dir - since I have some huge "Project.toml"s active, that tends to cover any major (and lot's of minor) packages. :-)
I think "SlicedArray" and "FlattenedArray" are indeed free - let's grab' em officially!
yup! Also, might you be available for a phone call?
Yes, that would speed things up - I'll send you an email.
I emailed you too, @bramtayl .
Could we move ArraysOfArrays into the JuliaArrays org?
Certainly! If you initiate the transfer I can accept it and I'll make you maintainer @oschultz. Then you should be able to add collaborators as you desire. (That seems simplest until/unless we fix the JuliaArrays team setup which seems strangely configured at the moment.)
@c42f while we're at it, I'd transfer UnsafeArrays.jl as well.
@c42f transfer done - looks like I had sufficient rights in JuliaArrays to do the transfer on my own. :-)
Ok, cool. Let me know if there's any setup which you actually need my help for!
@c42f , all fine, looks like - but could you return admin rights to ArraysOfArrays and UnsafeArrays to me? Those got lost in the move to JuliaArrays.
Thanks again for meeting! Here's a summary of what we spoke about:
FlattenedArray
+ the test utils. This will be non-breaking and we can go ahead and tag + release after merging.
flatview(::SlicedArray)
. Current options include:PermutedDimsArray
FlattenedArray(::SlicedArray)
SlicedArray
, which will replace ArrayOfSimilarArrays
, + some (possible) changes to your inner_*
functions. This will be a breaking PR.mapreduce
benchmark you mentioned that'd be great!* Should we avoid having variable return types? e.g. flatview(slice(rand(2,3,4), args...))
could either return an Array
(the parent) or one of the above options. This also comes up in getindex(::SlicedArray, I...)
:
julia> S=slice(rand(2,3,4), :, *, *)
3×4 slice(::Array{Float64,3}, :, *, *) with 2-element eltype SubArray{Float64,1,Array{Float64,3},Tuple{Base.Slice{Base.OneTo{Int64}},Int64,Int64},true}:
[0.242411, 0.723567] [0.313344, 0.705964] [0.45211, 0.308781] [0.637732, 0.0295599]
[0.806773, 0.477879] [0.581994, 0.515911] [0.98953, 0.462396] [0.980186, 0.498727]
[0.038467, 0.479799] [0.284736, 0.340725] [0.963542, 0.926671] [0.233843, 0.141494]
julia> S[:,1]
3-element slice(view(::Array{Float64,3}, :, :, 1), :, *) with 2-element eltype SubArray{Float64,1,Array{Float64,3},Tuple{Base.Slice{Base.OneTo{Int64}},Int64,Int64},true}:
[0.2424109148738105, 0.7235671322839587]
[0.8067728866162733, 0.47787858099260583]
[0.038466976626140514, 0.479799247981757]
julia> S[:,1,1]
3-element Array{SubArray{Float64,1,Array{Float64,3},Tuple{Base.Slice{Base.OneTo{Int64}},Int64,Int64},true},1}:
[0.2424109148738105, 0.7235671322839587]
[0.8067728866162733, 0.47787858099260583]
[0.038466976626140514, 0.479799247981757]
In that example, the optimization for S[:,1]
is fairly significant, so we probably want it:
julia> @btime $S[:,1];
12.085 ns (2 allocations: 80 bytes)
julia> @btime $S[:,1,1];
83.432 ns (5 allocations: 368 bytes)
The best example I can think of that suggests it's okay to have variable return types is reshape(A::AbstractArray{T,N}, ::Val{M})
. If M==N
you get A
, otherwise Base.ReshapedArray
.
For 3):
using JuliennedArrays
using BenchmarkTools
x = rand(1000, 1000)
@btime map(sum, Slices(x, False(), True()))
@btime sum(x, dims = 2)
Thanks for writing the summary!
Should we avoid having variable return types? e.g. flatview(slice(rand(2,3,4), args...))
I would strongly favor using the optimal return type for each case, as long as it can be done in a type-stable way. So if appropriate, flatview
should just return the parent array.
could you return admin rights to ArraysOfArrays and UnsafeArrays to me
Of course, done. I think you should now be able to add collaborators and manage all the things.
Thanks, @c42f ! @colinxs, you can write to ArraysOfArrays now - probably best to still make a PR for the new features though, will make it easier to discuss. Travis/Documenter for ArraysOfArrays is configured to generate doc previews for PRs, btw.
Haven't gotten around to making the PRs yet as I'm cleaning things up in SpecialArrays.jl first, but I wanted to give an update on the replacement for JuliennedArrays.Align
. The behaviour of Align
can be achieved with FlattenedArray
+ PermutedDimsArray
and benchmarking shows very little overhead (1-2%). Given the simplicity of that approach (and some of the tricks that PermutedDimsArray
does) I'm going to move forward with this approach.
First off, awesome idea/package! It's been quite useful in my own work.
I was really excited to use your package in conjunction with StructArrays but ran into an issue (that I attempted to fix with a PR to StructArrays) as explained here. The owner of that package pointed out, however, that the real issue is the following:
What are your thoughts on modifying eltype to be consistent with the true return type of getindex?
-Colin