antoinecarme / pyaf

PyAF is an Open Source Python library for Automatic Time Series Forecasting built on top of popular pydata modules.
BSD 3-Clause "New" or "Revised" License
458 stars 73 forks source link

Add temporal hierarchical forecasting #127

Closed antoinecarme closed 4 years ago

antoinecarme commented 4 years ago

PyAF hierarchical forecasting is still missing a temporal aspect. Try to prototype some kind of signal aggregation based on temporal hierarchies.

A good starting point is :

Athanasopoulos, G., Hyndman, R.J., Kourentzes, N., and Petropoulos, F. (2016) Forecasting with temporal hierarchies.

Expected deliverable : Jupyter notebook.

antoinecarme commented 4 years ago

R package :

https://github.com/robjhyndman/thief

antoinecarme commented 4 years ago

thief allows defining some specific categories of temporal aggregates :

https://github.com/robjhyndman/thief/blob/3cf654c53c0448182bd3847fa692ddee0badcfb2/R/tsaggregates.R#L62

 if(m==4L)
  {
    names(y.out)[mout==4L] <- "Annual"
    names(y.out)[mout==2L] <- "Biannual"
    names(y.out)[mout==1L] <- "Quarterly"
  }
  else if(m == 12L)
  {
    names(y.out) <- paste(mout,"-Monthly",sep="")
    names(y.out)[mout==12L] <- "Annual"
    names(y.out)[mout==6L] <- "Biannual"
    names(y.out)[mout==3L] <- "Quarterly"
    names(y.out)[mout==1L] <- "Monthly"
  }
  else if(m == 7L)
  {
    names(y.out)[mout==7L] <- "Weekly"
    names(y.out)[mout==1L] <- "Daily"
  }
  else if(m == 24L | m == 168L | m == 8760L)
  {
    names(y.out) <- paste(mout,"-Hourly",sep="")
    j <- mout%%24L == 0L
    names(y.out)[j] <- paste(mout[j]/24L,"-Daily",sep="")
    j <- mout%%168L == 0L
    names(y.out)[j] <- paste(mout[j]/168L,"-Weekly",sep="")
    j <- mout%%8760L == 0L
    names(y.out)[j] <- paste(mout[j]/8760L,"-Yearly",sep="")
    names(y.out)[mout==8760L] <- "Annual"
    names(y.out)[mout==2190L] <- "Quarterly"
    names(y.out)[mout==168L] <- "Weekly"
    names(y.out)[mout==24L] <- "Daily"
    names(y.out)[mout==1L] <- "Hourly"
  }
  else if(m == 48L | m == 336L | m == 17520L)
  {
    j <- mout%%2L == 0L
    names(y.out)[j] <- paste(mout[j]/2L,"-Hourly",sep="")
    j <- mout%%48L == 0L
    names(y.out)[j] <- paste(mout[j]/48L,"-Daily",sep="")
    j <- mout%%336L == 0L
    names(y.out)[j] <- paste(mout[j]/336L,"-Weekly",sep="")
    j <- mout%%17520L == 0L
    names(y.out)[j] <- paste(mout[j]/17520L,"-Yearly",sep="")
    names(y.out)[mout==17520L] <- "Annual"
    names(y.out)[mout==4380L] <- "Quarterly"
    names(y.out)[mout==336L] <- "Weekly"
    names(y.out)[mout==48L] <- "Daily"
    names(y.out)[mout==2L] <- "Hourly"
    names(y.out)[mout==1L] <- "Half-hourly"
  }
  else if(m == 52L)
  {
    names(y.out) <- paste(mout,"-Weekly",sep="")
    names(y.out)[mout==52L] <- "Annual"
    names(y.out)[mout==26L] <- "Biannual"
    names(y.out)[mout==13L] <- "Quarterly"
    names(y.out)[mout==1L] <- "Weekly"
  }
antoinecarme commented 4 years ago

Pandas allows creating more sophisticated time periods (offsets) and aggregating signals from one time resolution to another (resampling) :

https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#timeseries-offset-aliases

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.resample.html

antoinecarme commented 4 years ago

Pandas offset aliases :

https://github.com/pandas-dev/pandas/blob/14eda586582513c68f32f0a1f00ecfe8d6c7f8f3/pandas/_libs/tslibs/frequencies.pyx#L72


    # Quarterly frequencies with various fiscal year ends.
    # eg, Q42005 for Q-OCT runs Aug 1, 2005 to Oct 31, 2005
    "Q-DEC": 2000,    # Quarterly - December year end
    "Q-JAN": 2001,    # Quarterly - January year end
    "Q-FEB": 2002,    # Quarterly - February year end
    "Q-MAR": 2003,    # Quarterly - March year end
    "Q-APR": 2004,    # Quarterly - April year end
    "Q-MAY": 2005,    # Quarterly - May year end
    "Q-JUN": 2006,    # Quarterly - June year end
    "Q-JUL": 2007,    # Quarterly - July year end
    "Q-AUG": 2008,    # Quarterly - August year end
    "Q-SEP": 2009,    # Quarterly - September year end
    "Q-OCT": 2010,    # Quarterly - October year end
    "Q-NOV": 2011,    # Quarterly - November year end

    "M": 3000,        # Monthly

    "W-SUN": 4000,    # Weekly - Sunday end of week
    "W-MON": 4001,    # Weekly - Monday end of week
    "W-TUE": 4002,    # Weekly - Tuesday end of week
    "W-WED": 4003,    # Weekly - Wednesday end of week
    "W-THU": 4004,    # Weekly - Thursday end of week
    "W-FRI": 4005,    # Weekly - Friday end of week
    "W-SAT": 4006,    # Weekly - Saturday end of week

    "B": 5000,        # Business days
    "D": 6000,        # Daily
    "H": 7000,        # Hourly
    "T": 8000,        # Minutely
    "S": 9000,        # Secondly
    "L": 10000,       # Millisecondly
    "U": 11000,       # Microsecondly
    "N": 12000}       # Nanosecondly
antoinecarme commented 4 years ago

Pandas allows also more complex period specification ("6H" stands for a 6 hours period).

antoinecarme commented 4 years ago

Pyaf hierarchical forecasting will be designed to allow pandas-friendly time hierarchies like :

Sample tests scripts : https://github.com/antoinecarme/pyaf/tree/Temporal_Hierarchy/tests/temporal_hierarchy


test_temporal_demo_1.py =>      PERIODS = ["D" , "W" , "Q"]
test_temporal_demo_daily_D_W_2W.py =>   PERIODS = ["D" , "W" , "2W"]
test_temporal_demo_daily_D_W_2W_Q.py =>         PERIODS = ["D" , "W" , "2W" , "Q" ]
test_temporal_demo_daily_D_W_M.py =>    PERIODS = ["D" , "W" , "M"]
test_temporal_demo_daily_D_W_M_Q.py =>  PERIODS = ["D" , "W" , "M" , "Q"]
test_temporal_demo_daily_D_W_Q.py =>    PERIODS = ["D" , "W" , "Q"]
test_temporal_demo_hourly_H_6H_12H_D.py =>      PERIODS = ["H" , "6H" , "12H", "D"]
test_temporal_demo_hourly_H_6H_12H_D_W.py =>    PERIODS = ["H" , "6H" , "12H" , "D" , "W"]
test_temporal_demo_hourly_H_D.py =>     PERIODS = ["H" , "D"]
test_temporal_demo_minutely_T_10T_30T_H.py =>   PERIODS = ["T" , "10T", "30T", "H"]
test_temporal_demo_minutely_T_H_12H_D.py =>     PERIODS = ["T" , "H", "12H" , "D"]
test_temporal_demo_minutely_T_H.py =>   PERIODS = ["T" , "H"]
test_temporal_demo_monthly_M_2M_6M_12M.py =>    PERIODS = ["M" , "2M" , "6M" , "12M"]
test_temporal_demo_monthly_M_2M_6M.py =>        PERIODS = ["M" , "2M" , "6M"]
test_temporal_demo_monthly_M_Q_A.py =>  PERIODS = ["M" , "Q" , "A"]
test_temporal_demo_weekly_W_2W_M_Q.py =>        PERIODS = ["W" , "2W", "M", "Q"]
test_temporal_demo_weekly_W_Q_A.py =>   PERIODS = ["W" , "Q" , "A"]
antoinecarme commented 4 years ago

First jupyter notebook describing the GOOG stock forecsting in a hierarchical manner :

lHierarchy['Periods']= ["D", "W" , "2W" , "M"]

Daily, Weekly, bi-weekly and monthly signals are analyzed.

https://github.com/antoinecarme/pyaf/blob/Temporal_Hierarchy/notebooks_sandbox/temporal_hierarchy/Temporal_Hierarchy_prototyping_GOOG.ipynb

antoinecarme commented 4 years ago

Another jupyter notebook for a hourly (fake) time series (based on ozone) :

lHierarchy['Periods']= ["H", "6H" , "12H" , "D"]

Every hour, 6 hours, 12 hours and daily signals.

https://github.com/antoinecarme/pyaf/blob/Temporal_Hierarchy/notebooks_sandbox/temporal_hierarchy/Temporal_Hierarchy_prototyping_ozone_hourly.ipynb

antoinecarme commented 4 years ago

Three types of hierarchical forecasting are now available ("Grouped" , "Temporal" and "anything_else") :

https://github.com/antoinecarme/pyaf/blob/842040c2532ffcf4ecf7cdd703fc0c45e2178899/pyaf/HierarchicalForecastEngine.py#L67

    def create_signal_hierarchy(self , iInputDS, iTime, iSignal, iHorizon, iHierarchy, iExogenousData = None):
        lSignalHierarchy = None;
        if(iHierarchy['Type'] == "Grouped"):
            from .TS import Signal_Grouping as siggroup
            lSignalHierarchy = siggroup.cSignalGrouping();
        elif(iHierarchy['Type'] == "Temporal"):
            from .TS import Temporal_Hierarchy as temphier
            lSignalHierarchy = temphier.cTemporalHierarchy();
        else:
            from .TS import SignalHierarchy as sighier
            lSignalHierarchy = sighier.cSignalHierarchy();
antoinecarme commented 4 years ago

Closing

antoinecarme commented 4 years ago

Final fixes before 2.0

Added some trivial checks and their error messages :

  1. When time is not physical (integer and real series are not allowed as time columns)
  2. When time resolution is too low (cannot ask for hours in a daily dataset)
  3. When the hierarchy is not increasing ( ['6H' , 'H'] and ['D' , 'H' , 'W'] are not valid specifications)

Added one test for each case.