antoinecarme / pyaf

PyAF is an Open Source Python library for Automatic Time Series Forecasting built on top of popular pydata modules.
BSD 3-Clause "New" or "Revised" License
458 stars 73 forks source link

Make the Modeling Scale when the Number of Exogenous Variables Increases #198

Closed antoinecarme closed 2 years ago

antoinecarme commented 2 years ago

PyAF builds ARX models (AR models with signal lags and exogenous variables lags).

The exogenous variables are standardized or dummified. The dummification process generates more and more binary variables, one for each of the "5" most interesting categories of each exogenous variable.

One interesting use case is to let the user throw a very large dataframe as exogenous data and let PyAF do the hard work of cleaning, filtering and finding value in the n-most interesting variables. Pandas dataframes can hold thousands of columns.

PyAF has not yet been tested with thousands of exogenous variables. An automatic filtering procedure is to be introduced with a feature selection system (scikit-learn + feature selection).

The number P of lags used in each ARX model can be controlled through the model options and the exogenous variables can be filtered to keep only the K-most interesting variables with P-lags each.

Backward compatibility : Solving this issue will have no impact when the number of exogenous variables is limited ( ~ < 100 ).

Significant speed up is expected when the number of variables is large.

Nice to have for 2022-07-14 release.

antoinecarme commented 2 years ago

FIXED