Use at least 2 years of data

cruzki commented 4 years ago

If needed we need to impute by the closest "block" of the same seasonality:

For lack of monthly, we should take the same month of the surrounding years,
For lack of daily data, we should take the same day of the previous or next week (beware of special dates (!)
For lack of hourly data, we should take the same hour of the previous days (beware of weekend and special dates (!)

quesadagranja commented 4 years ago

@cruzki Function na_seasplit() does all we were looking for. There is an adjustable parameter (algorithm) that defines the kind of imputation performed. Below I show some examples when specifying a seasonal period of a week.

The original time series with missing samples we want to impute: Imputation by Interpolation, algorithm="interpolation": Imputation by Last Observation Carried Forward, algorithm="locf": Imputation by Mean Value, algorithm="mean": Imputation by Random Sample, algorithm="random": Imputation by Kalman Smoothing and State Space Models, algorithm="kalman" (some warning messages appear after imputation): Imputation by Weighted Moving Average, algorithm="ma":

In my opinion, the best idea is using imputation by Last Observation Carried Forward, since it copies what happened in the previous week (I still have to check what happens if the very first week has missing samples... who does it copy from?).

What I'm going to do now is the following:

Apply this method to all the time series in Low Carbon London to complete the missing inside samples.
If the time series is longer than 1 year but shorter than, say, 25 months, the method will be applied again to complete the missing months using a yearly seasonal period (I hope February 29 2012 doesn't bug too much). Or probably it's better to copy it directly but being aware of the weekday.
If the time series is shorter than 1 year, I'll have to reject it, since I cannot copy months whose behavior we don't know.

How many time series I'll reject? Luckily, just a few:

quesadagranja commented 4 years ago

@cruzki Another idea is using Last Observation Carried Forward if the gap is large (more than 8 hours or so) and using Interpolation if the gap is smaller.

My doubt now is to check if the interfaces between the original and imputated samples (A regions in the picture below) are smooth enough or not when using Last Observation Carried Forward:

cruzki commented 4 years ago

I agreen, the Last Observation Carried Forward is the best one. If you can do the interpolation easily, then ok, if not, do not worry too much. You are in a discrete world and these things are relativelly normal.

DeustoTech / WHY-suite

Use at least 2 years of data #9