antoinecarme / pyaf

PyAF is an Open Source Python library for Automatic Time Series Forecasting built on top of popular pydata modules.
BSD 3-Clause "New" or "Revised" License
456 stars 73 forks source link

PyAF 5.0 Final Touch 2: Disable alpha in ridge regressions #231

Closed antoinecarme closed 1 year ago

antoinecarme commented 1 year ago

To improve explainability, it is better to perform a real linear regression when estimating linear trends, polynomial and AR models.

PyAF uses sklearn.linear_model.Ridge model which uses a ridge parameter alpha = 1.0 by default which produces a non-zero residue mean error. Force alpha to be zero. This improves the detected cycles which are based on trend residues.

https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.Ridge.html

image

The impact on the model equation is not always noticeable. Exact models are improved (when the signal is an almost linear trend).

antoinecarme commented 1 year ago

Polynomial trend, before

image

antoinecarme commented 1 year ago

Polynomial trend, after

image

antoinecarme commented 1 year ago

The use of alpha=1 is a default setting of scikit-learn and was not intended by PyAF specs. This default choice is OK for classification/regression models.

The same kind of checks needs to be performed for Xgboost, LightGBM , PyTorch and other third party modeling software used in PyAF models. The default parameter choices need to be double-checked.