JuliaArrays / AxisArrays.jl

Performant arrays where each dimension can have a named axis with values
http://JuliaArrays.github.io/AxisArrays.jl/latest/
Other
200 stars 41 forks source link

WIP: Access the values of an axis with getproperty #152

Open c42f opened 5 years ago

c42f commented 5 years ago

Here's one possible use for getproperty with an AxisArray. I'm not yet sure this is a good idea (being an extremely new user of AxisArrays), but it seems rather convenient. [Edit: After a few days of using this I'd say that it's extremely convenient.]

A small demo in a signal processing-like context:

julia> using Unitful, AxisArrays

julia> A = AxisArray(rand(10,2), Axis{:time}((0:9)u"ms"), Axis{:channel}(1:2))
10Γ—2 AxisArray{Float64,2,Array{Float64,2},Tuple{Axis{:time,StepRange{Quantity{Int64,𝐓,Unitful.FreeUnits{(ms,),𝐓,nothing}},Quantity{Int64,𝐓,Unitful.FreeUnits{(ms,),𝐓,nothing}}}},Axis{:channel,UnitRange{Int64}}}}:
 0.891098  0.904336
 0.291428  0.125264
 0.116863  0.639308
 0.217566  0.972434
 0.656042  0.517365
 0.758901  0.723122
 0.856255  0.77854 
 0.671771  0.695505
 0.793823  0.86887 
 0.302142  0.837619

julia> A.time
0 ms:1 ms:9 ms

julia> A.channel
1:2

julia> plot(ustrip.(A.time), A)
# ...
iamed2 commented 5 years ago

I do think it's very important to have a better way to access the values of an axis, and this seems as good as any.

c42f commented 5 years ago

Cool, I've added

I'm not sure whether propertynames should include :data and :axes? I left these out for now as I assume data and axes field names are not part of the public API. (But is there an approved way to unwrap things to get at data?)

c42f commented 5 years ago

CI failures seem to be an unrelated problem in a different test (only on julia master)

iamed2 commented 5 years ago

I would implement propertynames such that private=true shows the data and axes fields.

There is currently no public interface to get data.

Inferred tests are really good! I would test that data and axes are still inferred as well.

c42f commented 5 years ago

Oh, excellent point, I didn't know about that the private flag to propertynames. Fixed now + extra tests added.

c42f commented 5 years ago

Documenter CI failures should be fixed by #153

timholy commented 5 years ago

Test failure on nightly is addressed by https://github.com/JuliaArrays/OffsetArrays.jl/pull/62.

I really like this---I'd even favor it as a way of resolving the issue of renaming AxisArrays.axes. Should we deprecate that as part of this change?

But there's one niggling issue: do you return the range values or the corresponding Axis object? I'd actually favor the latter because of this possible construction:

for i in eachindex(A.time)
    timeslice = @view A[A.time(i)]
    ...
end
iamed2 commented 5 years ago

But there's one niggling issue: do you return the range values or the corresponding Axis object? I'd actually favor the latter because of this possible construction:

I think then we need a public interface for getting the values of an axis. But perhaps now that we have the possibility of deprecating field access with getproperty, it's okay to recommend ax.val?

I really like this---I'd even favor it as a way of resolving the issue of renaming AxisArrays.axes. Should we deprecate that as part of this change?

At the very least the multi-arg version should be deprecated IMO

c42f commented 5 years ago

But there's one niggling issue: do you return the range values or the corresponding Axis object

Agreed, good question. The reason I didn't do this was because the user already knows (syntactically) which axis name they're accessing, so it seemed redundant to return the wrapped value.

So far in practical use this has been the right decision in most cases, but occasionally I've needed to re-wrap the axis which is annoying. One solution would be to make the Axis itself an AbstractArray; then the need to access val directly would be unnecessary. This also feels semantically sound.

However as the package is currently implemented, Axis is also used to hold scalars and other things. Perhaps it could be made to work though.

c42f commented 5 years ago

At the very least the multi-arg version should be deprecated IMO

Perhaps we should do this. I'm a bit cautious about getproperty as a general interface because it's kind of strangely non-extensible β€” there can only be one getproperty per concrete type. Therefore it's great for writing code when you already know the concrete type, but can be rather bad for generic situations.

So for this reason I'm not sure getproperty is a solution to the axes conundrum. And perhaps it's also a sign that it should return the concrete val as well.

Would it make more sense just merge Base.axes and AxisArrays.axes together and make axes(::AxisArray, d) return an AxisUnitRange (which behaves like a one dimensional AxisArray, but is a subtype of AbstractUnitRange (the docs say axes must return AbstractUnitRanges))?

c42f commented 5 years ago

So thinking about the Base.axes conundrum, I tried making an AxisUnitRange which is returned from Base.axes(A,i) for some A::AxisArray. It wraps together a combination of A.axes[i] and Base.axes(A.data)[i] and this seemed to work.

But then I wondered whether Axis itself should be this type, in which case we can merge the functionality of Base.axes and AxisArrays.axes entirely.

This would be a breaking change of course, because the Axis would then act as an AbstractUnitRange where indexing this range provides normal indices into the parent AxisArray rather than the "values along the axis" like it currently does. However, this is kind of consistent and symmetrical with the behavior of AxisArray itself.

We could just have a function axisdata / axisvalues or some such which extracts what is currently the .val field from the axis. With the current definitions this would be simply

axisvalues(ax::Axis) = ax.val
axisvalues(a::AxisArray, i) = a.axes[i]

@andyferris I know you've been thinking hard about things very much like this, I'd appreciate your thoughts!

timholy commented 5 years ago

Merging the two is hard, see #81. Fundamentally the problem is that we're pretty committed to the notion that AbstractArrays get indexed by integers, and that axes returns a cartesian product of ranges with entries that can be used to index the array. In Interpolations.jl we've switched to A(x, y, z) for "indexing"-with-interpolation and are contemplating giving A[i, j, k] and A(x, y, z) completely different behaviors. Here that's less viable because people might want to index with a mixture of "counting" and "value" indexes.

andyferris commented 5 years ago

I have a swirling of thoughts in this space. Sorry, they are a bit disorganized.

and that axes returns a cartesian product of ranges with entries that can be used to index the array.

Technically, axes returns a list of unit ranges, keys returns a Cartesian product (or simple range). I'm thinking that the transformation from axes to keys, i.e. keys(A) == CartesianIndices(axes(A)...) works fine if axes(A) is a tuple or a named tuple. There could similarly be a AxisUnitRange <: AbstractUnitRange which is both a unit range and a 1D axis array at the same time. Does this make sense Tim, or am I crazy? (EDIT: I think I am a bit crazy).

timholy commented 5 years ago

I'm thinking that the transformation from axes to keys, i.e. keys(A) == CartesianIndices(axes(A)...) works fine if axes(A) is a tuple or a named tuple.

My only point was that if axes(v) returns something along the lines of (0.1:0.1:1.0,) then CartesianIndices(axes(v)) is going to fail since the values are not integers. And currently we store such ranges as the AxisArrays.axes(v). This is why I wonder if axes(v) and v.axes might return different things.

c42f commented 5 years ago

Thanks Tim, it looks like @mbauman has already done versions of

Having thought about it a bit more it does seem that we need the latter as a thing to return from Base.axes if it can be made to work. For example you want to refer to the Axis values independently of the indices of any array it might happen to index.

if axes(v) returns something along the lines of (0.1:0.1:1.0,) then CartesianIndices(axes(v)) is going to fail since the values are not integers

Yes, I was trying to suggest that the return of Base.axes(::AxisArray) would act like it currently does when indexed, but also wrap the Axis so you can get at that as well, if necessary. Very much like 218b5c51 if I'm reading it right.

andyferris commented 5 years ago

Right. #81 looks like great work.

I'm in favour of using keys and axes to control the behaviour of things like similar. For example, if Base always used axes instead of size I think StaticArrays would be significantly easier to implement (for making nested containers like Diagonal(SVector(1,2,3)) behave like a statically-sized array). Similarly it would be perfect if Diagonal(::AxisVector) would behave similarly to an AxisMatrix.

You still need a convenient way of accessing axis values, of course (hence this PR I suppose). So yes, axes(A) and A.axes might return different things.

timholy commented 5 years ago

Just be aware that #58 could kill this (concept: I collected images with non-uniform sampling time):

julia> import AxisArrays

julia> using AxisArrays: AxisArray, Axis

julia> img0 = rand(100, 100, 8);

julia> img = AxisArray(img0, Axis{:y}(1:100), Axis{:x}(1:100), Axis{:time}(sort(rand(8))));

julia> AxisArrays.axes(img)
(Axis{:y,UnitRange{Int64}}(1:100), Axis{:x,UnitRange{Int64}}(1:100), Axis{:time,Array{Float64,1}}([0.0157872, 0.252618, 0.284213, 0.452175, 0.806429, 0.809029, 0.850077, 0.960828])) 

(And yes, we're actually doing that in practice these days.)

Dang we need traits.

andyferris commented 5 years ago

I didn't follow - what does #58 kill?

(And yes, we're actually doing that in practice these days.)

I can really see why you need something like AxisArrays to keep track of the relevant data.

A couple of quick questions (since I'm thinking of how to implement multi-dimensional dictionaries) - do you find you often need to access the time axis by integer index? Do you often index it by Float64 time? Do you mostly iterate through this axis? Or is this structure mostly used as a basis for interpolation?

Dang we need traits.

:)

timholy commented 5 years ago

Sorry, what I meant was that if Axis <: AbstractUnitRange then I think you want to make sure that step(::Axis) is defined, but of course if it's storing a Vector then there may be no step.

timholy commented 5 years ago

A couple of quick questions (since I'm thinking of how to implement multi-dimensional dictionaries) - do you find you often need to access the time axis by integer index? Do you often index it by Float64 time? Do you mostly iterate through this axis? Or is this structure mostly used as a basis for interpolation?

Most often we iterate over the time axis. Rarely indexed by Float64 time to find a particular slice, but increasingly we use such information for interpolation. There are other ways to tackle many of these things, but one concern is that if we narrow what an AxisArray can store in exchange for other conveniences, we might need an AlmostAxisArrays package. Which could be vaguely annoying.

andyferris commented 5 years ago

Right! This is precisely why I’m interested in dictionaries.

A dictionary is a container where lookup from key to value is fast. What you wrote is basically a multidimensional sort-based dictionary; there is a logarithmic time algorithm to find a given time slice. When there is a uniform step, this cost is reduced to O(1). When it is a unit range, the cost is basically eliminated (you may need to subtract an offset).

Some axes might be categorical and be suited to a hash-based approach. Others might be small and brute-force lookup may be best (or else be β€œstatic” as in StaticArrays).

A multidimensional dictionary should seemlessly handle all the cases from a dense ND array to a hash map with full efficiently.

tlnagy commented 5 years ago

This has always seemed super clunky to me

for i in AxisArrays.axes(img, Axis{:x}())
    ...
end

I collected images with non-uniform sampling time

We do this a lot as well. For me, the flexibility to have non-uniform sampling is a large advantage of AxisArrays` over Base.