antoinecarme / pyaf

PyAF is an Open Source Python library for Automatic Time Series Forecasting built on top of popular pydata modules.
BSD 3-Clause "New" or "Revised" License
459 stars 72 forks source link

Revisit Model Complexity Definition #223

Closed antoinecarme closed 1 year ago

antoinecarme commented 1 year ago

PyAF computes some model complexity indicator. When two models have the same MAPE, PyAF chooses the less complex.

This complexity indicator is dependent , roughly, on the number of inputs of each model component (trend inputs, AR lags, cycle length, etc)

The complexity of a model is generated using the sum of its components complexity indicators. This is very "artificial".

Use a "statistical" measure of complexity instead.

antoinecarme commented 1 year ago
# strings => avoid strange arithmetics (not additive). Prefer voting or counts/statistics
class eModelComplexity(Enum):
    Low = 'S'
    Medium = 'M'
    High = 'L' # or 'H' ??
antoinecarme commented 1 year ago

Each component type is assigned a complexity (for the whole class) among 'S' (small), 'M' (Medium) , 'L' (Large)

The final model complexity is the given by the counts of 'S', 'M' and 'L' occurrences in the model.

antoinecarme commented 1 year ago
    def getComplexity(self):
        # This is just a way to give priority to additive decompositions (default = 0 for additive).
        lModelTypeComplexity = {
            "T+S+R" : tscomplex.eModelComplexity.Low,
            "TS+R" : tscomplex.eModelComplexity.High,
            "TSR" : tscomplex.eModelComplexity.High,
        }
        lComplexity = {'Decomposition' : lModelTypeComplexity.get(self.mDecompositionType).value,
                       'Transformation' : self.mTransformation.mComplexity.value,
                       'Trend' : self.mTrend.mComplexity.value,
                       'Cycle' : self.mCycle.mComplexity.value,
                       'AR' : self.mAR.mComplexity.value}
        return lComplexity;   
antoinecarme commented 1 year ago

A model is assigned a complexity as a string of ordred letters. This indicator can be used to rank the models with comparable/indentical MAPE values.

antoinecarme commented 1 year ago
INFO:pyaf.std:DECOMPOSITION_TYPE 'T+S+R'
INFO:pyaf.std:BEST_TRANSOFORMATION_TYPE '_'
INFO:pyaf.std:BEST_DECOMPOSITION  '_Ozone_LinearTrend_residue_Seasonal_MonthOfYear_residue_NoAR' [LinearTrend + Seasonal_MonthOfYear + NoAR]
INFO:pyaf.std:TREND_DETAIL '_Ozone_LinearTrend' [LinearTrend]
INFO:pyaf.std:CYCLE_DETAIL '_Ozone_LinearTrend_residue_Seasonal_MonthOfYear' [Seasonal_MonthOfYear]
INFO:pyaf.std:AUTOREG_DETAIL '_Ozone_LinearTrend_residue_Seasonal_MonthOfYear_residue_NoAR' [NoAR]
INFO:pyaf.std:MODEL_MAPE MAPE_Fit=0.1761 MAPE_Forecast=0.1765 MAPE_Test=0.2209
INFO:pyaf.std:MODEL_SMAPE SMAPE_Fit=0.1712 SMAPE_Forecast=0.1941 SMAPE_Test=0.2249
INFO:pyaf.std:MODEL_DiffSMAPE DiffSMAPE_Fit=0.1688 DiffSMAPE_Forecast=0.1903 DiffSMAPE_Test=0.2196
INFO:pyaf.std:MODEL_MASE MASE_Fit=0.7728 MASE_Forecast=0.7151 MASE_Test=1.0918
INFO:pyaf.std:MODEL_CRPS CRPS_Fit=0.3308 CRPS_Forecast=0.2855 CRPS_Test=0.3409
INFO:pyaf.std:MODEL_L1 L1_Fit=0.6792 L1_Forecast=0.5552 L1_Test=0.5161
INFO:pyaf.std:MODEL_L2 L2_Fit=0.9118 L2_Forecast=0.663 L2_Test=0.5962
INFO:pyaf.std:MODEL_LnQ LnQ_Fit=7.4081 LnQ_Forecast=2.1757 LnQ_Test=0.9116
INFO:pyaf.std:MODEL_MEDIAN_AE MedAE_Fit=0.5329 MedAE_Forecast=0.5576 MedAE_Test=0.5692
INFO:pyaf.std:MODEL_KENDALL_TAU KENDALL_TAU_Fit=0.6293 KENDALL_TAU_Forecast=0.7354 KENDALL_TAU_Test=0.6565
INFO:pyaf.std:MODEL_KOLOMOGOROV_SMIRNOV KS_Fit=0.0915 KS_Forecast=0.2051 KS_Test=0.25
INFO:pyaf.std:MODEL_MANN_WHITNEY_U MWU_Fit=11828.5 MWU_Forecast=877.0 MWU_Test=68.0
INFO:pyaf.std:MODEL_AUC AUC_Fit=0.5053 AUC_Forecast=0.5766 AUC_Test=0.4722
INFO:pyaf.std:MODEL_COMPLEXITY {'Decomposition': 'S', 'Transformation': 'S', 'Trend': 'S', 'Cycle': 'S', 'AR': 'S'} [SSSSS]
INFO:pyaf.std:SIGNAL_TRANSFORMATION_DETAIL_START
INFO:pyaf.std:SIGNAL_TRANSFORMATION_MODEL_VALUES NoTransf None
INFO:pyaf.std:SIGNAL_TRANSFORMATION_DETAIL_END
antoinecarme commented 1 year ago

Model short list (ozone)

  Transformation DecompositionType                                              Model    Voting Complexity
1         _Ozone             T+S+R  _Ozone_LinearTrend_residue_Seasonal_MonthOfYea...  612.5000      SSSSS
2         _Ozone             T+S+R  _Ozone_LinearTrend_residue_Seasonal_MonthOfYea...  605.2500      LSSSS
0     Diff_Ozone             T+S+R  Diff_Ozone_ConstantTrend_residue_Seasonal_Mont...  617.1667      LMSSS
3         _Ozone             T+S+R  _Ozone_PolyTrend_residue_Seasonal_MonthOfYear_...  590.1667      LMSSS
antoinecarme commented 1 year ago

The model with the more 'S' in the complexity indicator is the less complex (complexities are ordered in the reverse alphabetical order).

antoinecarme commented 1 year ago

FIXED. Closing.