Closed tbeason closed 6 years ago
I had planned to use missing when updating for your requested behavior. Your suggestion and thoughts are helpful. I appreciate this.
check back tomorrow morning
Pkg.checkout("RollingFunctions")
using RollingFunctions
I am not ready to tag this -- awaiting feedback, and your alpha testing. Let me know of any glitches you encounter with their causes [I ran it once].
It seems I may need to implement missing-aware versions if we want e.g. a rolling mean that does not propagate missing as the mean of 50 values that includes one missing. What do you think?
Both changes (span > window & missing values) appear to be working, at least for the limited test cases I tried. Thanks for that.
My first thought about your suggestion about missing propagation is that it might not be necessary. Missings provides several functions for dealing with missing
values in an Array, so the user could utilize that functionality when passing data to rolling functions. For example, if I want to ignore missing
when computing a rolling sum, I can replace them with 0 via Missings.replace(data,0)
. However, that actually does fail right now when I supply it to a rolling function where I fill with missing
.
julia> roll_sum(FILL_FIRST,5,missing,collect(Missings.replace(a,0)))
ERROR: MethodError: no method matching roll_sum(::Type{Val{:FILL_FIRST}}, ::Int64, ::Missings.Missing, ::Array{Float64,1})
Closest candidates are:
roll_sum(::Type{Val{:FILL_FIRST}}, ::Int64, ::T, ::AbstractArray{T,1}) where T at C:\Users\tbeason\.julia\v0.6\RollingFunctions\src\running.jl:15
roll_sum(::Type{Val{:FILL_FIRST}}, ::Int64, ::AbstractArray{T,1}) where T at C:\Users\tbeason\.julia\v0.6\RollingFunctions\src\running.jl:9
roll_sum(::Type{Val{:FILL_LAST}}, ::Int64, ::T, ::AbstractArray{T,1}) where T at C:\Users\tbeason\.julia\v0.6\RollingFunctions\src\running.jl:17
...
I'm guessing that this fails because the type of the input is not Union{T,Missing}
, but the result needs to be widened to that because the filler is missing
. So in that sense, perhaps things need to be made more "missing-aware" as you put it. Additionally, it is necessary to collect
the iterator right now because without collecting the input is not yet a subtype of AbstractVector
. So, again, another place that has room for improvement because it would be better if I didn't have to do that collection.
On the other hand, if you aren't opposed to keyword arguments, and it isn't too difficult, perhaps just having skipmissing={true|false}
and replacemissing=value
keyword arguments might be more convenient. The problem is that how missing values get treated changes depending on what statistic you are computing. For sum, it's 0. For prod, it's 1. When computing the mean, it's 0 but is it counted as an actual observation? The list goes on. Might be better to leave those decisions to the user. In the work that I do, typically we want missing propagation, so cases when I need to avoid it would be a minority of the time anyway.
Since the ecosystem seems to be moving towards
missing
values instead ofNaN
, I think it would be appropriate to incorporate those things here. I think it should be possible to just change all of the method signature endings fromwhere T<:Number
towhere T<:Union{Number,Union{Number,Missing}}
. I did this locally onrolling
,rolling_fill_first
, androll_sum
and was able to getroll_sum(FILL_FIRST,Int,missing,data)
to work as expected.I did not test any other cases (mostly just hacking this together so I can move on with my program), but my guess is that it will work for other cases as well.
Hope this is a helpful suggestion.