antoinecarme / pyaf

PyAF is an Open Source Python library for Automatic Time Series Forecasting built on top of popular pydata modules.
BSD 3-Clause "New" or "Revised" License
459 stars 72 forks source link

Enforce type checking for time and signal columns #47

Closed antoinecarme closed 7 years ago

antoinecarme commented 7 years ago

Time should be a time (np.dtype is time/date or numeric) Signal should be numeric issue error messages if time/signal column is not found in the training dataset and has correct type.

antoinecarme commented 7 years ago

Added the following checks :

    def checkData(self, iInputDS, iTime, iSignal, iHorizon, iExogenousData):        
        if(iHorizon != int(iHorizon)):
            raise Exception("NON_INTEGER_HORIZON " + str(iHorizon));
        if(iHorizon < 1):
            raise Exception("NEGATIVE_OR_NULL_HORIZON " + str(iHorizon));
        if(iTime not in iInputDS.columns):
            raise Exception("TIME_COLUMN_NOT_FOUND " + str(iTime));
        if(iSignal not in iInputDS.columns):
            raise Exception("SIGNAL_COLUMN_NOT_FOUND " + str(iSignal));
        type1 = np.dtype(iInputDS[iTime])
        # print(type1)
        if(type1.kind != 'M' and type1.kind != 'i' and type1.kind != 'u' and type1.kind != 'f'):
            raise Exception("TIME_COLUMN_TYPE_NOT_ALLOWED '" + str(iTime) + "' '" + str(type1) + "'");
        type2 = np.dtype(iInputDS[iSignal])
        # print(type2)
        if(type2.kind != 'i' and type2.kind != 'u' and type2.kind != 'f'):
            raise Exception("SIGNAL_COLUMN_TYPE_NOT_ALLOWED '" + str(iSignal) + "' '" + str(type2) + "'");
        # time in exogenous data should be the strictly same type as time in training dataset (join needed)
        if(iExogenousData is not None):
            lExogenousDataFrame = iExogenousData[0];
            lExogenousVariables = iExogenousData[1];
            if(iTime not in lExogenousDataFrame.columns):
                raise Exception("TIME_COLUMN_NOT_FOUND_IN_EXOGENOUS " + str(iTime));
            for exog in lExogenousVariables:
                if(exog not in lExogenousDataFrame.columns):
                    raise Exception("EXOGENOUS_VARIABLE_NOT_FOUND " + str(exog));

            type3 = np.dtype(lExogenousDataFrame[iTime])
            if(type1 != type3):
                raise Exception("INCOMPATIBLE_TIME_COLUMN_TYPE_IN_EXOGENOUS '" + str(iTime) + "' '" + str(type1)  + "' '" + str(type3) + "'");
antoinecarme commented 7 years ago

Time is now a numpy type (numpy.datetime64)

antoinecarme commented 7 years ago

Added the following tests

ls tests/basic_checks/ issue_46_min_max_issues.py issue_46_one_or_two_rows.py issue_46_negative_horizon.py issue_46_non_existant_time_column.py issue_46_wrong_type_date_column.py issue_46_non_existant_signal_column.py issue_46_wrong_type_signal_column.py

antoinecarme commented 7 years ago

reopened this issue to add checks for hierarchical models.