bgrimstad / splinter

Library for multivariate function approximation with splines (B-spline, P-spline, and more) with interfaces to C++, C, Python and MATLAB
Mozilla Public License 2.0
419 stars 115 forks source link

Is the penalized B-spline aware of errorbars in the data? #78

Closed exook closed 7 years ago

exook commented 7 years ago

Hi,

I am currently using this library to fit some scientific data with a P-Spline. I am having some problems with "Zero values" where the spline is too flexible and fits the zero values in my data and breaks the spline. Is there any way to make the spline aware of the error bars of the data or is the spline already fitting according to data points and their errors?

Best regards, Alex

bgrimstad commented 7 years ago

Hi Alex,

The P-spline is fitted to the data points only - I am not sure what you mean by "error bars". Could you please elaborate.

Without more information I would have to guess that what your are experiencing is a case of overfitting. Maybe you can try to increase the alpha value for increased regularization?

Kind regards, Bjarne

exook commented 7 years ago

Hi, thank you for your answer. After further investigation I see that the p-spline does not break at zero values so there is nothing wrong with the current implementation of the spline.

With error bars I mean the uncertainty of a data point from a measurement. For my research I had to be certain that there was no feature yet that implemented uncertainties into the p-spline.

Do you think that it is possible to implement a p-spline that takes measurement uncertainties into account? I'm thinking if the p-spline is asked to fit (within a set of data points) the x,y point (4,6). If this point has a y-uncertanty of +/-0.5 then the segment of the p-spline around that point would not be penalized further if the spline is within (4,5.5) and (4,6.5).

Thank you for this great library, Alex

P.S. is there any way to increase the number of knots in a p-spline? And in that way increase the resolution of the spline. I'm noticing that for low alpha the spline is very jagged even though in the y-direction it is not close to a data point.

bgrimstad commented 7 years ago

Hi, thanks for clearing that up. What you are looking for is called weighted least squares regression. This will allow you to specify the weight to put on each sample point. These weights are often specified as the reciprocal of the variance (higher variance/uncertainty gives lower weight). I have made an issue (#80) to include this in the next release. If you subscribe to it you will be notified when it has been completed.

If you use the EQUIDISTANT knot vector you can specify the number of knots by setting numBasisFunctions. Let us know if this works for you.

Bjarne

bgrimstad commented 7 years ago

Weighted least squares is now available in the multidim-control-points branch. I have added a Python example which displays the use case.

exook commented 7 years ago

Wow, that was quick! Thank you so much, this will be a great addition to my field of research!

exook commented 7 years ago

Regarding the resolution of the spline. When I use: "knot_spacing=KnotSpacing.AS_SAMPLED, num_basis_functions=int(1e6)" It works as it should. as_sample_e6

But when I use: "knot_spacing=KnotSpacing.EQUIDISTANT, num_basis_functions=int(1e6)" The spline goes out of control. equidistant_e6 And using num_basis_functions=int(1e100) or other values doesn't change it

bgrimstad commented 7 years ago

Hi,

The P-spline seems to be behaving as expected when I try to replicate your case.

A few hints that may help you when setting the knot vectors:

In short: num_basis_functions only works for EQUIDISTANT_KNOTS; num_basis_functions should be reduced; the regularization parameter alpha should be adjusted (preferably using K-fold cross validation).

It is on our TODO list to improve the documentation and rework the BSpline::Builder to make the construction of knot vectors clearer to the user.

Hope this helps.

exook commented 7 years ago

Hi again,

Thank you so much for all the help you have been giving me. I am currently working to wrap up my Bachelor's thesis in physics at Lund University, Sweden (due 7th of May), where I implement your Splinter library. If I have understood it correctly, the mathematics of splines can be implemented using different methods. Since I have mostly used wikipedia and different lecture slides to understand splines, I therefore wonder if you would have time to read my section on splines and p-splines (1.5 pages) to check that I am not completely lying.

I have emailed you concerning this on a "itk.ntnu.no" email address, but it is very likely that it ended up in a spam folder or that this email is inactive. ( If you have time you can reach me on pekman@uci.edu)

Best regards, Alex

bgrimstad commented 7 years ago

I have sent you my comments per e-mail.

Best of luck on your thesis!