JuliaArrays / AxisArrays.jl

Performant arrays where each dimension can have a named axis with values
http://JuliaArrays.github.io/AxisArrays.jl/latest/
Other
200 stars 42 forks source link

Reductions do not preserve type #55

Closed tknopp closed 7 years ago

tknopp commented 7 years ago
julia> AxisArray(randn(3,3), (:x,:y),(0.1,0.1),(-0.1,-0.2))
2-dimensional AxisArray{Float64,2,...} with axes:
    :x, -0.1:0.1:0.1
    :y, -0.2:0.1:0.0
And data, a 3×3 Array{Float64,2}:
 0.763468   0.587158  0.520983 
 0.0610593  0.801456  0.721546 
 0.755276   0.834483  0.0795272

julia> maximum(B,2)
3×1 Array{Float64,2}:
 0.763468
 0.801456
 0.834483

julia> B[:,2]
1-dimensional AxisArray{Float64,1,...} with axes:
    :x, -0.1:0.1:0.1
And data, a 3-element Array{Float64,1}:
 0.587158
 0.801456
 0.834483

I would have expected that maximum also returns an AxisArray. This was the case with the old Image object.

ping @timholy

timholy commented 7 years ago

This raises an interesting issue: with similar: right now we have

julia> similar(B, Axis{:x}(1:5))
2-dimensional AxisArray{Float64,2,...} with axes:
    :x, 1:5
    :y, -0.2:0.1:0.0
And data, a 5×3 Array{Float64,2}:
 6.94654e-310  6.94654e-310  6.94654e-310
 6.94654e-310  6.94654e-310  6.94654e-310
 6.94654e-310  6.94654e-310  6.94654e-310
 6.94654e-310  6.94654e-310  6.94654e-310
 6.94654e-310  6.94654e-310  6.94654e-310

Should similar instead return an AxisArray with just the listed axes?

A sweet and type-stable interface would be maximum(B, Axis{:y}).

tknopp commented 7 years ago

Yes this is a nice interface, but still I would be good if the type is preserved when using the regular interface. getindex also preserves type.

timholy commented 7 years ago

Right, we should definitely preserve the AxisArray wrapper, it just won't be inferrable.

getindex can be type-stable because the arguments are positional; if you had getindex(A, whichdimension, index) you couldn't make that inferrable, either.

tknopp commented 7 years ago

Ah, I see. You have a very detailed eye on this, which is really great. I have not focused to much in type stability since I usually only optimized the "hot loop".

timholy commented 7 years ago

Focusing on just the hot loop, and ignoring all inconsequential optimizations, is exactly what you should be doing. The problem is that if you have a function like this:

function foo(A)
    B = maximum(A, 2)
    for i in eachindex(B)
        # hot loop
    end
end

then if the type of B is not inferrable, suddenly your hot loop cannot run quickly. You can solve that with

function foo(A)
    B = maximum(A, 2)
    _foo(B)
end

@noinline function _foo(B)
    for i in eachindex(B)
        # hot loop
    end
end

but one would rather not force that kind of code on users unless strictly necessary.

tknopp commented 7 years ago

:-) This is exactly how we usually solve this problem.

But anyway: I am currently focussing on a fast port to the new Image infrastructure and will likely go over everything in a second pass to make things more beautiful. The issue is that my (private) code base is kind of "in production" which means that I need to keep it stable.

mbauman commented 7 years ago

Given that reductions currently retain the full dimensionality of the original array, I think this can be done in a type stable manner.

timholy commented 7 years ago

Ah, right. I was already transitioning to the alternative (which I'm not sure would be a good idea).

tknopp commented 7 years ago

but isn't retaining the dimension inconsistent with getindex behavior?

tknopp commented 7 years ago

Use case is a 3D image viewer where I can toggle between 3-slice-view and 2-MIP-view.

mbauman commented 7 years ago

Yup. This is https://github.com/JuliaLang/julia/issues/16606. I don't think there's a right choice; both are useful at times.

A ./ sum(A, 2) # Only works if we preserve dimensions
A[any(isnan(A), 2), :] # Only works if we drop dimensions
GordStephen commented 7 years ago

Regarding preserving dimensions on reductions, in the specific context of AxisArrays my 2¢ would be that there's no meaningful value to assign to the output's reduction axis, and so preserving it doesn't make sense:

julia> x = AxisArray(randn(10,3), now():Dates.Day(1):now()+Dates.Day(9), [:A, :B, :C]);

julia> maximum(x, 1)
2-dimensional AxisArray{Float64,2,...} with axes:
    :row, 0000-12-31T00:00:00.001:1 day:0000-12-31T00:00:00.001
    :col, Symbol[:A,:B,:C]
And data, a 1×3 Array{Float64,2}:
 0.882012  1.30775  0.813192

In fact, depending on the reduction axis type, automatically assigning an arbitrary value isn't even always possible:

julia> maximum(x, 2)
ERROR: MethodError: Cannot `convert` an object of type Int64 to an object of type Symbol
This may have arisen from a call to the constructor Symbol(...),
since type constructors fall back to convert methods.

While there are broadcasting-related benefits to keeping the dimension, I think there's a strong case to be made that AxisArray broadcasting should be based on Axis types, not naive dimension ordering (e.g. #54).

timholy commented 7 years ago

Good points. Once we have broadcasting I'd be happy to revisit this.