DeustoTech / WHY-suite

WHY T2.1
3 stars 0 forks source link

Use at least 2 years of data #9

Closed cruzki closed 4 years ago

cruzki commented 4 years ago

If needed we need to impute by the closest "block" of the same seasonality:

quesadagranja commented 4 years ago

@cruzki Function na_seasplit() does all we were looking for. There is an adjustable parameter (algorithm) that defines the kind of imputation performed. Below I show some examples when specifying a seasonal period of a week.

The original time series with missing samples we want to impute: image Imputation by Interpolation, algorithm="interpolation": image Imputation by Last Observation Carried Forward, algorithm="locf": image Imputation by Mean Value, algorithm="mean": image Imputation by Random Sample, algorithm="random": image Imputation by Kalman Smoothing and State Space Models, algorithm="kalman" (some warning messages appear after imputation): image Imputation by Weighted Moving Average, algorithm="ma": image

In my opinion, the best idea is using imputation by Last Observation Carried Forward, since it copies what happened in the previous week (I still have to check what happens if the very first week has missing samples... who does it copy from?).

What I'm going to do now is the following:

  1. Apply this method to all the time series in Low Carbon London to complete the missing inside samples.
  2. If the time series is longer than 1 year but shorter than, say, 25 months, the method will be applied again to complete the missing months using a yearly seasonal period (I hope February 29 2012 doesn't bug too much). Or probably it's better to copy it directly but being aware of the weekday.
  3. If the time series is shorter than 1 year, I'll have to reject it, since I cannot copy months whose behavior we don't know.

How many time series I'll reject? Luckily, just a few: image

quesadagranja commented 4 years ago

@cruzki Another idea is using Last Observation Carried Forward if the gap is large (more than 8 hours or so) and using Interpolation if the gap is smaller.

My doubt now is to check if the interfaces between the original and imputated samples (A regions in the picture below) are smooth enough or not when using Last Observation Carried Forward: image

cruzki commented 4 years ago

I agreen, the Last Observation Carried Forward is the best one. If you can do the interpolation easily, then ok, if not, do not worry too much. You are in a discrete world and these things are relativelly normal.