bgrimstad / splinter

Library for multivariate function approximation with splines (B-spline, P-spline, and more) with interfaces to C++, C, Python and MATLAB
Mozilla Public License 2.0
418 stars 115 forks source link

P-Spline: determine the "best" smoothing parameter #125

Closed romainbqt closed 4 years ago

romainbqt commented 4 years ago

Dear all,

First I would like to thank you for this amazing library.

I would like to know if you have implemented a method to determine the best smoothing parameter for P-Spline regression in order to not overfit statistical fluctuations.

For example in the following plot I set the smoothing parameter lambda = 1500 which enables to obtain a smooth P-Spline curve (blue) and give a pretty good description of the general trend whereas a smaller value of lambda would overfit the fluctuations (red).

I had to search manually such value of lambda. It would be great if such method was implemented or if you have an idea on how to do it.

Many thanks in advance, raw_data_vs_pspline

bgrimstad commented 4 years ago

Hi @romainbqt,

Glad you find Splinter useful.

Lambda is a hyper-parameter: a parameter that we need to set before we fit the spline to data. What you are looking for is called hyper-parameter optimization. There are several libraries that you can use with Splinter to perform hyper-parameter optimization. For example, if you are using Python, the Hyperopt package may be helpful (https://github.com/hyperopt/hyperopt).

Bjarne

romainbqt commented 4 years ago

Dear @bgrimstad,

Thank you for your answer.
Since yesterday I am searching for a way to optimize the smoothing parameter. I came across papers speaking about minimizing a given estimator to find the "best" smoothing parameter. Most of them rely on the "hat matrix" such as explained in section 2.3 of the article below https://www.stat.ubc.ca/~matias/RobustPsplines20Oct08Names.pdf To my understanding (I am not at all an expert on regression algorithm) it seems this minimization of the estimator to find the smoothing parameter can only be done inside the splinter code

I am keeping searching Best regards, Romain