JuliaArrays / AxisArrays.jl

Performant arrays where each dimension can have a named axis with values
http://JuliaArrays.github.io/AxisArrays.jl/latest/
Other
200 stars 41 forks source link

Fix interval indexing with offset axes #90

Closed mbauman closed 7 years ago

mbauman commented 7 years ago

This one is terrifying. We were only testing against axes of the form step:step:last. Requires https://github.com/ajkeller34/Unitful.jl/pull/90 to pass tests.

timholy commented 7 years ago

Whew, glad you caught this.

Maybe add some tests for potential roundoff error? 0.0:0.1:0.3 is often a good test case since

julia> 3*0.1
0.30000000000000004

Or we can just encourage users to use a bit of padding in the intervals they supply.

mbauman commented 7 years ago

Yeah, I'm ashamed I didn't notice this any sooner.

That's a very good point about floating point errors. I don't want to automatically add slop in the library, but we should be doing this calculation in double-precision when possible:

julia> A = AxisArray(-10:10, -1.0:0.1:1.0);

julia> A[-.3 .. .3]
1-dimensional AxisArray{Int64,1,...} with axes:
    :row, -0.3:0.1:0.3
And data, a 7-element UnitRange{Int64}:
 -3
 -2
 -1
  0
  1
  2
  3

julia> A[(-.3 .. .3) + [0]]
2-dimensional AxisArray{Int64,2,...} with axes:
    :row_sub, -0.2:0.1:0.2
    :row_rep, [0.0]
And data, a 5×1 Array{Int64,2}:
 -2
 -1
  0
  1
  2

In the first iteration of this fix, I had kept the use of unsafe_searchsorted and constructed a temporary range with step(r):step(r):step(r):

julia> AxisArrays.unsafe_searchsorted(.1:.1:.1, -.3 .. .3)
-3:3

That works, but it's not dimensionally correct. I suppose I could add zero(eltype(r)) to the endpoints, but at that point it seemed simpler to just do the division and multiplication directly. I'm not sure what the best approach here would be.

mbauman commented 7 years ago

Actually, on second thought -- I think a proxy like step(r):step(r):step(r) is exactly the right thing. It may solve dates, too. The only tricky thing is if it will preserve the identity A[(x..y)] == vec(A[(x..y)+[0]]) for floating point ranges.

timholy commented 7 years ago

There's an advantage in leveraging range methods, since they exploit the TwicePrecision type to handle roundoff. How about basing it on something like this?

nsteps(x, step) = floor(Int, abs(x / step))
function nsteps{T}(x, step::TwicePrecision{T})
    # this is basically a hack because Base hasn't defined x/step at TwicePrecision resolution
    nf = abs(x / convert(T, step))
    nc = ceil(Int, nf)
    return abs(convert(T, nc*step)) <= abs(x) ? nc : floor(Int, nf)
end

julia> r = -1.0:0.1:1.0
-1.0:0.1:1.0

julia> step(r)
0.1

julia> nsteps(0.3, step(r))
2

julia> nsteps(0.3, r.step)
3

Why this works:

julia> r.step
Base.TwicePrecision{Float64}(0.09999999999999987, 1.3322676295501878e-16)

julia> 3*r.step
Base.TwicePrecision{Float64}(0.3, 1.1102230246251526e-17)

julia> convert(Float64, 3*r.step)
0.3

By calling nsteps for both ends of the interval, you can construct the UnitRange{Int} that offsets the indices.

mbauman commented 7 years ago

Yes, that was the first thing I thought of myself, but I didn't particularly care to think through the required math to define that division. It's a great workaround — thanks! Unfortunately, it doesn't solve the problem for 0.5. I tried my "phony" range idea, but that too runs into issues with 0.5.

timholy commented 7 years ago

So, make 0.6 required? Alternatively I can try to backport https://github.com/JuliaLang/julia/pull/18777 via compat, but that's a fair amount of work and might introduce problems.

timholy commented 7 years ago

I really like the commit to just support more TwicePrecision operations (d080469). OK if I steal it and contribute it to Base (with proper attribution, of course)?

I'd love to see this cross the finish line. If it's necessary, though I'd rather not I'm hereby offering to look into backporting JuliaLang/julia#18777 (or at least the pieces needed to get this to pass).

timholy commented 7 years ago

Rebased and merged in #102