antoinecarme / pyaf

PyAF is an Open Source Python library for Automatic Time Series Forecasting built on top of popular pydata modules.
BSD 3-Clause "New" or "Revised" License
458 stars 73 forks source link

Allow using exogenous data in hierarchical forecasting models #124

Closed antoinecarme closed 4 years ago

antoinecarme commented 4 years ago

PyAF does not yet allow using exogenous data (explanatory variables) to enrich the models used in hierarchies.

Expect the possibility to define one exogenous data for all hierarchy nodes or setting a per-node exogenous data.

antoinecarme commented 4 years ago

First specification method : one exogenous data for all nodes : (dataframe , list of used variables)

https://github.com/antoinecarme/pyaf/blob/hierarchical_exog/tests/hierarchical/test_hierarchy_AU_AllMethods_Exogenous_all_nodes.py


def create_exog_data(b1):
    # fake exog data based on date variable
    lDate1 = b1.mPastData['Date']
    lDate2 = b1.mFutureData['Date'] # not needed. exogfenous data are missing when not available.
    lDate = lDate1.append(lDate2)
    lExogenousDataFrame = pd.DataFrame()
    lExogenousDataFrame['Date'] = lDate
    lExogenousDataFrame['Date_second'] = lDate.dt.second
    lExogenousDataFrame['Date_minute'] = lDate.dt.minute
    lExogenousDataFrame['Date_hour'] = lDate.dt.hour
    lExogenousDataFrame['Date_dayofweek'] = lDate.dt.dayofweek
    lExogenousDataFrame['Date_day'] = lDate.dt.day
    lExogenousDataFrame['Date_dayofyear'] = lDate.dt.dayofyear
    lExogenousDataFrame['Date_month'] = lDate.dt.month
    lExogenousDataFrame['Date_week'] = lDate.dt.week
    # a column in the exog data can be of any type
    lExogenousDataFrame['Date_day_name'] = lDate.dt.day_name()
    lExogenousDataFrame['Date_month_name'] = lDate.dt.month_name()
    lExogenousVariables = [col for col in lExogenousDataFrame.columns if col.startswith('Date_')]
    lExogenousData = (lExogenousDataFrame , lExogenousVariables) 
    return lExogenousData
antoinecarme commented 4 years ago

Second specification method : per-node exogenous data : lExogenous[signal] = (dataframe , list of used variables)

https://github.com/antoinecarme/pyaf/blob/hierarchical_exog/tests/hierarchical/test_hierarchy_AU_AllMethods_Exogenous_per_node.py


def create_exog_data(b1):
    # fake exog data based on date variable
    lDate1 = b1.mPastData['Date']
    lDate2 = b1.mFutureData['Date'] # not needed. exogfenous data are missing when not available.
    lDate = lDate1.append(lDate2)
    lExogenousDataFrame = pd.DataFrame()
    lExogenousDataFrame['Date'] = lDate
    lExogenousDataFrame['Date_second'] = lDate.dt.second
    lExogenousDataFrame['Date_minute'] = lDate.dt.minute
    lExogenousDataFrame['Date_hour'] = lDate.dt.hour
    lExogenousDataFrame['Date_dayofweek'] = lDate.dt.dayofweek
    lExogenousDataFrame['Date_day'] = lDate.dt.day
    lExogenousDataFrame['Date_dayofyear'] = lDate.dt.dayofyear
    lExogenousDataFrame['Date_month'] = lDate.dt.month
    lExogenousDataFrame['Date_week'] = lDate.dt.week
    # a column in the exog data can be of any type
    lExogenousDataFrame['Date_day_name'] = lDate.dt.day_name()
    lExogenousDataFrame['Date_month_name'] = lDate.dt.month_name()
    lExogenousVariables = [col for col in lExogenousDataFrame.columns if col.startswith('Date_')]
    lExogenousData = {}
    # define exog only for three state nodes
    lExogenousData["NSW_State"] = (lExogenousDataFrame , lExogenousVariables[:3]) 
    lExogenousData["VIC_State"] = (lExogenousDataFrame , lExogenousVariables[-3:]) 
    lExogenousData["QLD_State"] = (lExogenousDataFrame , lExogenousVariables) 
    return lExogenousData
antoinecarme commented 4 years ago

The M5 Competition

https://mofc.unic.ac.cy/m5-competition/

image

antoinecarme commented 4 years ago

    def get_exogenous_data(self, signal):
        if(self.mExogenousData is None):
            return None
        # A signal is a hierarchy node
        if(type(self.mExogenousData) == tuple):
            # same data for all signals
            return self.mExogenousData
        if(type(self.mExogenousData) == dict):
            # one exogenous data by signal
            return self.mExogenousData.get(signal)
        raise tsutil.PyAF_Error("BAD_EXOGENOUS_DATA_SPECIFICATION");
antoinecarme commented 4 years ago

Closing.

Will be officially available in release 2.0