JuliaMath / NaNMath.jl

Julia math built-ins which return NaN and accumulator functions which ignore NaN
Other
53 stars 28 forks source link

Missing values support #30

Open pdeffebach opened 6 years ago

pdeffebach commented 6 years ago

I tested out this package on 0.7 with missings and got the following error:

I think this is because Statistics is not imported by NaNMath, so the function calls to NaNMath.mean can't fall base on Statistics.mean.

I think the solution here is to add import Statistics and add fall backs for operations of type Any so those are called when there are missing values. Does that seem feasible?

nm.mean([1.,2., missing])
> ERROR: WARNING: Base.mean is deprecated: it has been moved to the standard library package `Statistics`.
Add `using Statistics` to your imports.
 in module Base
MethodError: no method matching mean(::Array{Union{Missing, Float64},1})
You may have intended to import Base.mean
Closest candidates are:
  mean(::AbstractArray{T<:AbstractFloat,N} where N) where T<:AbstractFloat at /Users/peterdeffebach/.julia/packages/NaNMath/pEda/src/NaNMath.jl:167
Stacktrace:
 [1] top-level scope at none:0
schnirz commented 6 years ago

I encountered a similar issue when using this package with missing values in 0.6.4:

julia> versioninfo()
Julia Version 0.6.4
Commit 9d11f62bcb (2018-07-09 19:09 UTC)
Platform Info:
  OS: macOS (x86_64-apple-darwin14.5.0)
  CPU: Intel(R) Core(TM) i7-4870HQ CPU @ 2.50GHz
  WORD_SIZE: 64
  BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Haswell MAX_THREADS=16)
  LAPACK: libopenblas64_
  LIBM: libopenlibm
  LLVM: libLLVM-3.9.1 (ORCJIT, haswell)

julia> using NaNMath

julia> NaNMath.log(-100)
NaN

julia> NaNMath.log(missing)
ERROR: MethodError: no method matching log(::Missings.Missing)
You may have intended to import Base.log
Closest candidates are:
  log(::Float32) at /Users/martink/.julia/v0.6/NaNMath/src/NaNMath.jl:12
  log(::Float64) at /Users/martink/.julia/v0.6/NaNMath/src/NaNMath.jl:11
  log(::Real) at /Users/martink/.julia/v0.6/NaNMath/src/NaNMath.jl:13
  ...

julia> Base.log(missing)
missing

Would it make sense to define functions in this packages to return missing for missing input values? This would make sense to me, as missing and NaN here represent two different scenarios: NaN stands for invalid input (e.g., negative values for log), whereas missing stands for entirely absent input.

log here could be defined for Union{Missing, Float64} input types and have a simple ismissing check within its definition. I'm happy to work on this if this is a desired feature of the package.

mlubin commented 6 years ago

I see how this is useful, but handling missing extends beyond the problem that this package was designed to address and that I have interest in maintaining. Maybe MissingNaNMath is in order? :)

pdeffebach commented 6 years ago

Fair enough!

mkborregaard commented 6 years ago

But - given that missing is part of Base Julia, is there any use for a numerical calculation package that does not support it? That sounds quite dangerous.

mlubin commented 6 years ago

is there any use for a numerical calculation package that does not support it? That sounds quite dangerous.

This comment is a bit hyperbolic. The package is already being used by upstream AD packages like ForwardDiff and JuMP (https://github.com/JuliaOpt/JuMP.jl/blob/82e998f72e849f7a5f8c264accb41d9eafbcc845/src/Derivatives/Derivatives.jl#L15) for well-defined purposes that currently don't require any support for missing.

mkborregaard commented 6 years ago

I see you're right this package is still equally useful in that context. I guess I'm just disappointed as I thought this could be a more broadly useful package, and most packages that would depend on this will need to be able to support all the numerical functionality defined in Base for these functions.

My suggestion that it might be "dangerous" came from the notion that users would expect Base.mean and NaNMath.mean to behave similarly except in their treatment of NaN.

pdeffebach commented 6 years ago

One solution that is missing-agnostic is to have fallbacks for non-float types that revert to the functions defined in Base. But you might want to throw errors when non-floats are used given the purpose of NaNMath.

mlubin commented 6 years ago

I view a MethodError as pretty low on the dangerousness scale. Silently not handling NaNs as promised is more dangerous, e.g., https://github.com/mlubin/NaNMath.jl/pull/21#issuecomment-305064958.

Anyway I'm willing to share maintainership if there's interest in making the package more useful for additional applications.

mkborregaard commented 6 years ago

You're right. After looking into it I'm also less certain that the right approach is to change this to treat missing like NaN. Sorry for the noise.

nalimilan commented 6 years ago

It would make sense to accept ~arrays~ iterables of any element type though, and only throw an error when actually encountering a value of an unsupported type. That's what Statistics.mean does, and it's useful if you want to pass skipmissing(x) to NaNMath.mean.