JeffreySarnoff / RollingFunctions.jl

Roll a window over data; apply a function over the window.
MIT License
114 stars 6 forks source link

Window and Data length #6

Closed tbeason closed 6 years ago

tbeason commented 6 years ago

Currently, if I use a rolling function with a window length N and my vector is only of length M < N, I get an error. I'm using a FILL_FIRST, so I expected that those entries would just be filled as they would if my vector was longer.

JeffreySarnoff commented 6 years ago

Are you requesting that I return a vector longer than the one that you input?

tbeason commented 6 years ago

I should hope not! Sorry, I should have left an example. I just thought it was a bit off that these cases return an error, since the behavior to me seems obvious (at least in the FILL_FIRST scenario). If the window length is longer than the data, and we are filling from the start, rather than error I think it should simply fill the vector. So in the second function call below, we would return fill(NaN,(11,)).

a=rand(12)
roll_sum(FILL_FIRST,12,NaN,a) #works as expected
roll_sum(FILL_FIRST,12,NaN,a[1:11]) #returns an error

I have not given any thought to cases other than FILL_FIRST, so I don't know how general this intuition is.

JeffreySarnoff commented 6 years ago

Is there a use case you have in mind where the client would really want to have specified a window length that exceeds the data length? The reason that it is an error now is that it looked like a mistake the client would want to know about. I have no problem with using your view on this ... just tell me why you want to process this as if all is as intended (using roll_op(FILL_FIRST|LAST, 12, NaN, a[1:11] as longhand for a[1:11] = NaN.

tbeason commented 6 years ago

I am using this on a DataFrame, doing a rolling operation by group (using the transform function from DataFramesMeta). Overall, I expect most groups to have more than 12 months of data, so that there is no issue using a 12 month window. However, there are evidently a few groups that do not have 12 months of data. Naturally, the statistic can't be computed for them, which is fine. However, the entire operation fails right now because of that.

I did end up doing a workaround for this particular time (count number of nonmissing observations ahead of time, drop groups with less than 12), but I think that really shouldn't be necessary because the suggested behavior seems reasonable and will be a faster and more efficient way to do it than my workaround. IIRC, SAS also uses/implements this behavior in the proc expand procedure.

JeffreySarnoff commented 6 years ago

ok -- will do