JuliaStats / Loess.jl

Local regression, so smooooth!
Other
103 stars 34 forks source link

Getting NaN at boundary, and a few other strange values #28

Closed jrfiedler closed 1 year ago

jrfiedler commented 6 years ago

A few examples showing that NaN is frequently returned at the boundary, and a few zeros inside the range of the input. For all of these, the predictions are being made on the same x values that were put into loess.

I'm using Julia 1.0.1, with Loess installed today.

Example 1

Part 1

julia> using Loess

julia> x = [1.0, 2.0, 3.0, 4.0];

julia> y = [1.0, 2.0, 3.0, 4.0];

julia> model = loess(x, y, span = 0.25);

julia> predict(model, x)
4-element Array{Float64,1}:
 NaN  
   0.0
   0.0
 NaN  

The boundary predictions are NaN. The middle two predictions are zero, even though all of the input y values are positive.

Part 2

Same x and y, changing span to 0.33.

julia> model = loess(x, y, span = 0.33);

julia> predict(model, x)
4-element Array{Float64,1}:
 1.0              
 0.0              
 0.0              
 4.000000000000002

The boundary predictions aren't NaN, but the middle two values are again zero.

Example 2

Part 1

julia> using Loess

julia> x = [1.0, 1.0, 2.0, 3.0, 4.0, 4.0];

julia> y = [1.0, 1.0, 2.0, 3.0, 4.0, 4.0];

julia> model = loess(x, y, span = 0.33);

julia> predict(model, x)
6-element Array{Float64,1}:
 NaN                
 NaN                
   0.0              
   2.999999999999999
 NaN                
 NaN

The boundary predictions are NaN, and one of the middle predictions is zero.

Part 2

Changing span to 0.4 and 0.5 produces an error.

julia> model = loess(x, y, span = 0.40);
ERROR: LinearAlgebra.LAPACKException(2)
Stacktrace:
 [1] chklapackerror at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.0/LinearAlgebra/src/lapack.jl:38 [inlined]
 [2] trtrs!(::Char, ::Char, ::Char, ::Array{Float64,2}, ::Array{Float64,1}) at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.0/LinearAlgebra/src/lapack.jl:3348
 [3] ldiv! at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.0/LinearAlgebra/src/triangular.jl:583 [inlined]
 [4] \(::LinearAlgebra.UpperTriangular{Float64,Array{Float64,2}}, ::Array{Float64,1}) at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.0/LinearAlgebra/src/triangular.jl:1854
 [5] \(::Array{Float64,2}, ::Array{Float64,1}) at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.0/LinearAlgebra/src/generic.jl:868
 [6] #loess#9(::Bool, ::Float64, ::Int64, ::Function, ::Array{Float64,2}, ::Array{Float64,1}) at /home/jf/.julia/packages/Loess/TaYdb/src/Loess.jl:102
 [7] #loess at ./none:0 [inlined]
 [8] #loess#14 at /home/jf/.julia/packages/Loess/TaYdb/src/Loess.jl:110 [inlined]
 [9] (::getfield(Loess, Symbol("#kw##loess")))(::NamedTuple{(:span,),Tuple{Float64}}, ::typeof(loess), ::Array{Float64,1}, ::Array{Float64,1}) at ./none:0
 [10] top-level scope at none:0

julia> model = loess(x, y, span = 0.50);
ERROR: LinearAlgebra.LAPACKException(2)
Stacktrace:
 [1] chklapackerror at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.0/LinearAlgebra/src/lapack.jl:38 [inlined]
 [2] trtrs!(::Char, ::Char, ::Char, ::Array{Float64,2}, ::Array{Float64,1}) at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.0/LinearAlgebra/src/lapack.jl:3348
 [3] ldiv! at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.0/LinearAlgebra/src/triangular.jl:583 [inlined]
 [4] \(::LinearAlgebra.UpperTriangular{Float64,Array{Float64,2}}, ::Array{Float64,1}) at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.0/LinearAlgebra/src/triangular.jl:1854
 [5] \(::Array{Float64,2}, ::Array{Float64,1}) at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.0/LinearAlgebra/src/generic.jl:868
 [6] #loess#9(::Bool, ::Float64, ::Int64, ::Function, ::Array{Float64,2}, ::Array{Float64,1}) at /home/jf/.julia/packages/Loess/TaYdb/src/Loess.jl:102
 [7] #loess at ./none:0 [inlined]
 [8] #loess#14 at /home/jf/.julia/packages/Loess/TaYdb/src/Loess.jl:110 [inlined]
 [9] (::getfield(Loess, Symbol("#kw##loess")))(::NamedTuple{(:span,),Tuple{Float64}}, ::typeof(loess), ::Array{Float64,1}, ::Array{Float64,1}) at ./none:0
 [10] top-level scope at none:0

Part 3

Changing span to 0.6

julia> model = loess(x, y, span = 0.60);

julia> predict(model, x)
6-element Array{Float64,1}:
 NaN                
 NaN                
   2.0              
   2.999999999999999
   4.000000000000001
   4.000000000000001

Predictions on the left boundary are NaN.

andreasnoack commented 3 years ago

I think this might mostly be an issue with throwing errors or raise warnings. For reference, In R the first example gives

> df
  x y
1 1 1
2 2 2
3 3 3
4 4 4
> ft = loess(y~x, df, span=0.25)
There were 15 warnings (use warnings() to see them)
> predict(ft, df)
Error in predLoess(object$y, object$x, newx = if (is.null(newdata)) object$x else if (is.data.frame(newdata)) as.matrix(model.frame(delete.response(terms(object)),  : 
  NA/NaN/Inf in foreign function call (arg 5)

You are asking for the function to compute quadratic approximations to single data points which is not going to work. I guess we should fail if the local problems end up being underdetermined.

andreasnoack commented 3 years ago

I've looked a bit more into this and it does look like there are a couple of other issues

The predict function assumes that the vertices returned in https://github.com/JuliaStats/Loess.jl/blob/master/src/Loess.jl#L147 are distinct but that is often not the case for the first value. E.g.

julia> Loess.traverse(Loess.KDTree([0.0 1.0 2.0]'), [0.0])
2-element Vector{Tuple{Float64}}:
 (0.0,)
 (0.0,)

A possible solution is to avoid the interpolation in https://github.com/JuliaStats/Loess.jl/blob/cb38f5e224b30b372b386c9fbc5d4cc662daf5fa/src/Loess.jl#L152-L153 when the input value is one of the vertices. I'll prepare a PR with a fix.

I'm don't really understand why the KDTree isn't using the input data as the vertices instead of using medians. I believe using the input data would avoid (at least most of) the rounding that causes

4.000000000000001
4.000000000000001

More generally, it might be a good idea to just use https://github.com/KristofferC/NearestNeighbors.jl instead of an internal structure for this.

ayushpatnaikgit commented 3 years ago

I've had a similar issue. I am curious about how the package handles NaN. I have missing data points, but since the loess function doesn't accept missing, I converted them to NaN.

I am pasting the real data, so I apologize for the legibility. You can simply copy and paste these in REPL if you want to try.

y = [14.73, 17.08, 17.63, NaN, 14.51, 16.75, 18.46, 17.24, 18.8, 18.73, 14.64, 20.61, 18.87, 19.76, NaN, 5.99, 13.12, 16.82, 23.63, 20.77, 13.93, 17.75, 20.16, 19.8, 17.43, 17.39, 16.59, 13.41, 9.4, 15.2, 16.06, 10.95, 16.37, 15.99, 19.03, 15.5, 23.77, 14.75, 7.7, 11.9, 10.37, 10.74, 19.1, 19.95, 17.84, 16.29, 21.02, 18.93, 18.4, 18.31, 13.27, 9.09, 8.18, 10.94, 22.18, 16.37, 22.97, 17.64, 15.44, 14.81, 15.49, 15.22, 9.82, 16.15, 18.13, 28.12, 15.36, 15.17, 25.17, 16.05, 19.82, 15.22, 18.77, 14.05, 14.13, 4.77, NaN, 18.39, 17.51, 21.99, 23.86, 22.13, 23.98, 19.52, 22.06, 19.27, 21.7, 19.01, 21.15, NaN, 26.61, 22.15, 23.61, 19.16, 23.07]
x = [11.0, 15.0, 2.0, 0.0, 3.0, 2.0, 9.0, 14.0, 13.0, 14.0, 16.0, 15.0, 16.0, 13.0, 0.0, 1.0, 3.0, 4.0, 6.0, 13.0, 16.0, 15.0, 13.0, 17.0, 14.0, 13.0, 9.0, 2.0, 2.0, 8.0, 10.0, 13.0, 13.0, 10.0, 15.0, 12.0, 14.0, 15.0, 2.0, 5.0, 2.0, 3.0, 9.0, 12.0, 14.0, 11.0, 11.0, 14.0, 14.0, 9.0, 5.0, 1.0, 4.0, 10.0, 12.0, 17.0, 16.0, 18.0, 13.0, 18.0, 16.0, 14.0, 5.0, 4.0, 4.0, 4.0, 11.0, 16.0, 13.0, 15.0, 16.0, 11.0, 13.0, 12.0, 3.0, 1.0, 0.0, 10.0, 9.0, 14.0, 15.0, 18.0, 13.0, 18.0, 13.0, 13.0, 3.0, 2.0, 1.0, 0.0, 9.0, 11.0, 11.0, 19.0, 18.0]

There are 4 NaN in y.

Now I build the model and predict.

model = loess(y, x)
v = predict(model,x)

It turns out, there are 29 NaN in v.