Closed cruzki closed 4 years ago
@cruzki Function na_seasplit()
does all we were looking for. There is an adjustable parameter (algorithm
) that defines the kind of imputation performed. Below I show some examples when specifying a seasonal period of a week.
The original time series with missing samples we want to impute:
Imputation by Interpolation, algorithm="interpolation"
:
Imputation by Last Observation Carried Forward, algorithm="locf"
:
Imputation by Mean Value, algorithm="mean"
:
Imputation by Random Sample, algorithm="random"
:
Imputation by Kalman Smoothing and State Space Models, algorithm="kalman"
(some warning messages appear after imputation):
Imputation by Weighted Moving Average, algorithm="ma"
:
In my opinion, the best idea is using imputation by Last Observation Carried Forward, since it copies what happened in the previous week (I still have to check what happens if the very first week has missing samples... who does it copy from?).
What I'm going to do now is the following:
How many time series I'll reject? Luckily, just a few:
@cruzki Another idea is using Last Observation Carried Forward if the gap is large (more than 8 hours or so) and using Interpolation if the gap is smaller.
My doubt now is to check if the interfaces between the original and imputated samples (A regions in the picture below) are smooth enough or not when using Last Observation Carried Forward:
I agreen, the Last Observation Carried Forward is the best one. If you can do the interpolation easily, then ok, if not, do not worry too much. You are in a discrete world and these things are relativelly normal.
If needed we need to impute by the closest "block" of the same seasonality: