JeffreySarnoff / RollingFunctions.jl

Roll a window over data; apply a function over the window.
MIT License
117 stars 7 forks source link

Upgrade to allow for missing values #7

Closed tbeason closed 6 years ago

tbeason commented 6 years ago

Since the ecosystem seems to be moving towards missing values instead of NaN, I think it would be appropriate to incorporate those things here. I think it should be possible to just change all of the method signature endings from where T<:Number to where T<:Union{Number,Union{Number,Missing}}. I did this locally on rolling, rolling_fill_first, and roll_sum and was able to get roll_sum(FILL_FIRST,Int,missing,data) to work as expected.

I did not test any other cases (mostly just hacking this together so I can move on with my program), but my guess is that it will work for other cases as well.

Hope this is a helpful suggestion.

JeffreySarnoff commented 6 years ago

I had planned to use missing when updating for your requested behavior. Your suggestion and thoughts are helpful. I appreciate this.

JeffreySarnoff commented 6 years ago

check back tomorrow morning

JeffreySarnoff commented 6 years ago
Pkg.checkout("RollingFunctions")
using RollingFunctions

I am not ready to tag this -- awaiting feedback, and your alpha testing. Let me know of any glitches you encounter with their causes [I ran it once].

It seems I may need to implement missing-aware versions if we want e.g. a rolling mean that does not propagate missing as the mean of 50 values that includes one missing. What do you think?

tbeason commented 6 years ago

Both changes (span > window & missing values) appear to be working, at least for the limited test cases I tried. Thanks for that.

My first thought about your suggestion about missing propagation is that it might not be necessary. Missings provides several functions for dealing with missing values in an Array, so the user could utilize that functionality when passing data to rolling functions. For example, if I want to ignore missing when computing a rolling sum, I can replace them with 0 via Missings.replace(data,0). However, that actually does fail right now when I supply it to a rolling function where I fill with missing.

julia> roll_sum(FILL_FIRST,5,missing,collect(Missings.replace(a,0)))
ERROR: MethodError: no method matching roll_sum(::Type{Val{:FILL_FIRST}}, ::Int64, ::Missings.Missing, ::Array{Float64,1})
Closest candidates are:
  roll_sum(::Type{Val{:FILL_FIRST}}, ::Int64, ::T, ::AbstractArray{T,1}) where T at C:\Users\tbeason\.julia\v0.6\RollingFunctions\src\running.jl:15
  roll_sum(::Type{Val{:FILL_FIRST}}, ::Int64, ::AbstractArray{T,1}) where T at C:\Users\tbeason\.julia\v0.6\RollingFunctions\src\running.jl:9
  roll_sum(::Type{Val{:FILL_LAST}}, ::Int64, ::T, ::AbstractArray{T,1}) where T at C:\Users\tbeason\.julia\v0.6\RollingFunctions\src\running.jl:17
  ...

I'm guessing that this fails because the type of the input is not Union{T,Missing}, but the result needs to be widened to that because the filler is missing. So in that sense, perhaps things need to be made more "missing-aware" as you put it. Additionally, it is necessary to collect the iterator right now because without collecting the input is not yet a subtype of AbstractVector. So, again, another place that has room for improvement because it would be better if I didn't have to do that collection.

On the other hand, if you aren't opposed to keyword arguments, and it isn't too difficult, perhaps just having skipmissing={true|false} and replacemissing=value keyword arguments might be more convenient. The problem is that how missing values get treated changes depending on what statistic you are computing. For sum, it's 0. For prod, it's 1. When computing the mean, it's 0 but is it counted as an actual observation? The list goes on. Might be better to leave those decisions to the user. In the work that I do, typically we want missing propagation, so cases when I need to avoid it would be a minority of the time anyway.