JuliaMath / Interpolations.jl

Fast, continuous interpolation of discrete datasets in Julia
http://juliamath.github.io/Interpolations.jl/
Other
530 stars 109 forks source link

Extrapolate to missing #199

Open cstjean opened 6 years ago

cstjean commented 6 years ago

It would be nice to support extrapolating to missing

> using Interpolations, Missings
> A = rand(20)
> A_x = collect(1.0:2.0:40.0)
> knots = (A_x,)
> itp = extrapolate(interpolate(knots, A, Gridded(Linear())), missing)
> itp[42]

MethodError: Cannot `convert` an object of type Missings.Missing to an object of type Float64
This may have arisen from a call to the constructor Float64(...),
since type constructors fall back to convert methods.

Stacktrace:
 [1] getindex(::Interpolations.FilledExtrapolation{Union{Float64, Missings.Missing},1,Interpolations.GriddedInterpolation{Float64,1,Float64,Interpolations.Gridded{Interpolations.Linear},Tuple{Array{Float64,1}},0},Interpolations.Gridded{Interpolations.Linear},Interpolations.OnGrid,Missings.Missing}, ::Int64) at /home/cedric/.julia/v0.6/Interpolations/src/extrapolation/filled.jl:25
 [2] include_string(::String, ::String) at ./loading.jl:522
cstjean commented 6 years ago

interpolate correctly handles Vector{Missing, Float64}, although it outputs a performance warning.

tomasaschan commented 6 years ago

It's not desirable to be able to extrapolate a Float64 interpolation object to something that's not a Float64, because that would violate type safety.

However, if Float64 is the datatype of the non-missing data, is there a reason NaN (which is still a value in Float64) is not good enough?

Otherwise, the library in general is written in a generic fashion, so it should be very possible to create a data type that is either Float64 or Missing (basically what Nullable{T} is/was), and then also implement vector space semantics for elements of that type (that is, mathematical operators like + and *), which would let you extrapolate to a value that could be missing.

timholy commented 5 years ago

We should support this. But it's not anywhere high on my priority list. Feel free to take the lead, @cstjean :wink:.

timholy commented 5 years ago

(For @tomasaschan, no need for the Nullable stuff in modern versions of Julia, see https://julialang.org/blog/2018/08/union-splitting.)

cstjean commented 5 years ago

The error comes from this function, attempting to convert Missing to Float64:

@inline function (etp::FilledExtrapolation{T,N})(x::Vararg{Number,N}) where {T,N}
    itp = parent(etp)
    coefs = coefficients(itp)
    Tret = typeof(prod(x) * zero(eltype(coefs)))
    if checkbounds(Bool, itp, x...)
        @inbounds itp(x...)
    else
        convert(Tret, etp.fillvalue)    # error here
    end
end

The easy solution is to take out the convert. Thanks to union splitting, it should be fine performance-wise, but it's a breaking change. One significant test fails without the convert:

    etpf = @inferred(extrapolate(itpg, NaN))
    ....
    x =  @inferred(etpf(dual(-2.5,1)))   # fails - it's inferred as Union{Float64, Dual{Float64}}
    @test isa(x, Dual)

I believe that this Union could have performance consequences? You wrote the convert in https://github.com/JuliaMath/Interpolations.jl/pull/226 last September, so I assume you had Julia 1.0 in mind.

Instead, we could easily special-case etp.fillvalue isa Missing. Or perhaps !(etp.fillvalue isa Number), and avoid the conversion in those cases. How does that sound?

The "correct" way to extend the current code would be to fix the type computation in typeof(prod(x) * zero(eltype(coefs))), to somehow produce Union{Missing, ...}, but as far as I know, that would either require special-casing, or Base._return_type(prod, ...)

yakir12 commented 5 years ago

Ran into this as well. One way that makes sense type-safety-wise is to require the dependent variable (the y) to be of a Vector{Union{Missing, <:Number}}, this way any extrapolation to missing won't violate the type of the original container, kind of like DataFrames.allowmissing.

I might be missing something.