Make the Modeling Scale when the Number of Exogenous Variables Increases

PyAF builds ARX models (AR models with signal lags and exogenous variables lags).

The exogenous variables are standardized or dummified. The dummification process generates more and more binary variables, one for each of the "5" most interesting categories of each exogenous variable.

One interesting use case is to let the user throw a very large dataframe as exogenous data and let PyAF do the hard work of cleaning, filtering and finding value in the n-most interesting variables. Pandas dataframes can hold thousands of columns.

PyAF has not yet been tested with thousands of exogenous variables. An automatic filtering procedure is to be introduced with a feature selection system (scikit-learn + feature selection).

The number P of lags used in each ARX model can be controlled through the model options and the exogenous variables can be filtered to keep only the K-most interesting variables with P-lags each.

Backward compatibility : Solving this issue will have no impact when the number of exogenous variables is limited ( ~ < 100 ).

Significant speed up is expected when the number of variables is large.

Nice to have for 2022-07-14 release.

antoinecarme / pyaf

Make the Modeling Scale when the Number of Exogenous Variables Increases #198