JuliaMath / Interpolations.jl

Fast, continuous interpolation of discrete datasets in Julia
http://juliamath.github.io/Interpolations.jl/
Other
533 stars 110 forks source link

N-dimensional interpolation with missing and nothing values #330

Open briochemc opened 5 years ago

briochemc commented 5 years ago

missing values propagate in the interpolation and the 1D trick suggestion in the documentation of filtering missing values out does not generalize to multiple dimensions. Unless I missed something? This begs the question: Is it possible to interpolate data with missing values in n dimensions with n>1? Here is a minimal example to illustrate the kind of thing I am thinking about:

julia> nx, ny = 3, 4

julia> xs, ys = 1:nx, 1:ny

julia> data = [cos(x) + sin(y) + y for x in xs, y in ys] # some data
3×4 Array{Float64,2}:
 2.38177   3.4496   3.68142  3.7835
 1.42532   2.49315  2.72497  2.82705
 0.851478  1.9193   2.15113  2.25321

julia> data = convert(Array{Union{Float64, Missing}}, data) ;

julia> data[2:end-1, 2:end-1] .= missing ; data # with a hole of missing values
3×4 Array{Union{Missing, Float64},2}:
 2.38177   3.4496    3.68142   3.7835
 1.42532    missing   missing  2.82705
 0.851478  1.9193    2.15113   2.25321

With the data above, I'm not even sure how one should interpolate values near the missing entries. Currently, as mentioned at the top of this post, the missing values propagate:

julia> knots = (xs, ys) ;

julia> itp = interpolate(knots, data, Gridded(Linear()))
3×4 interpolate((1:3,1:4), ::Array{Union{Missing, Float64},2}, Gridded(Linear())) with element type Float64:
 missing  missing  missing  missing
 missing  missing  missing  missing
 missing  missing  missing  missing

Is there a good way to interpolate data close to the missing entries? Maybe by applying the filtering trick in each direction as a default? (Of course, one could use the Constant() interpolation instead of the Linear() one, but the problem of interpolation near the original missing values remains.)

I would also be interested in having interpolation boundaries in the sense that the interpolation should not go across those boundaries. For that, I thought using nothing values made sense? Here is an example of what I am thinking about, adding a nothing in place of one of the missing entries:

julia> data = convert(Array{Union{Float64, Missing, Nothing}}, data) ;

julia> data[2,2] = nothing ; data # Now with a nothing in there
3×4 Array{Union{Missing, Nothing, Float64},2}:
 2.38177   3.4496    3.68142   3.7835
 1.42532    nothing   missing  2.82705
 0.851478  1.9193    2.15113   2.25321

In this case, I feel like querying for values near the remaining missing should not use the [2,1] entry (the 1.42532) because it is separated from the query by a nothing value.

Maybe these features are dumb to request, I'm not sure 😅, but I think they could be useful for some applications, like geographic data interpolation. For example, consider a marine tracer for which a missing value at location (x,y) could mean "no data was collected at (x,y)" while nothing could represent "(x,y) location is an island". In this case it would make sense to be able to interpolate the data in 2D across holes of missing data, while blocking the interpolation through islands where, in terms of the marine tracer data, nothing is observed.

tlnagy commented 4 years ago

I just ran into this limitation. I would like to fill in holes (missing data) in a 2D sample by interpolating from data around the holes, but the suggestions in the docs for 1D give me the same problems as OP.

JianghuiDu commented 4 years ago

Same problem here.

cstjean commented 4 years ago

FWIW, for our application (resampling time series at fixed timesteps with interpolation), propagating missing values is the desired behaviour...