aeon-toolkit / aeon

A toolkit for machine learning from time series
https://aeon-toolkit.org/
BSD 3-Clause "New" or "Revised" License
947 stars 109 forks source link

[ENH] Parameter optimization and TS validation #984

Open msh855 opened 8 months ago

msh855 commented 8 months ago

idea

This is a superb library and now i am using it more and more for my work. I would like, however, to suggest to add some functionality about hyperparameter tuning. It is extremely important I think and would significantly boost the potentials of this library.

In my case, I am using Dynamic Factor Models extensively. One problem here is the optimal setting of the number of factors. This is also relevant for HMM models.

Second, is on clustering. Here there many parameters to tune and there is no obvious way how to tune each of them for each clustering algorithm.

A third suggestion in on model selection, especially with Time Series models like Dynamic Factor models. Here there are many different specifications (e.g. number of lags, modelling of the error term etc.) that one could use.

proposed solution

In the case of Dynamic Factor models one common practice is to use the process described here. See for example section: "Determining the number of factors and shocks to the factors"

On TS clustering enhancement, it could be good to have some standard approaches like elbow method to tune the models.

On the third suggestion, one suggestion is TS cross validation to determine the optimal model.

baraline commented 8 months ago

Hi @msh855, thanks for using aeon !

Concerning TS validation, as we interface easily with Scikit-learn, you can directly use their Time Series Split in your pipeline with aeon estimators.

For parameter optimization, we don't yet have a tool to optimize the hyperparameters of our estimators. I would advise you to look at tools such as Optuna for this purpose for now.

For model selection, you can take inspiration from the forecasting benchmark and adapt it to your use cases, for example benchmarking different configurations of a Dynamic Factor model, and/or perform a sensitivity analysis. You may want to change the validation scheme in your use-case (I think it uses resamples as the classification benchmark does).

TonyBagnall commented 8 months ago

thanks very much for the suggestions @msh855. For clustering, we can show how to use the elbow method Im sure, we can improve the notebooks maybe @chrisholder. You should also be able to use the scikit learn parameter tuning/model selection tools e.g https://scikit-learn.org/stable/modules/grid_search.html

I dont know about dynamic factor models, can go take a look, thanks.