bgrimstad / splinter

Library for multivariate function approximation with splines (B-spline, P-spline, and more) with interfaces to C++, C, Python and MATLAB
Mozilla Public License 2.0
418 stars 115 forks source link

B-splines and incomplete grid #72

Open sdrdis opened 8 years ago

sdrdis commented 8 years ago

Hi,

Any indication on when B-spline will be able to work with incomplete grids ? Does an implementation / hack exists for 2D grids ?

Thanks,

bgrimstad commented 8 years ago

Hello,

Thank you for showing your interest in SPLINTER.

Actually, we do have two different approaches that work with scattered data (incomplete grids). Both implementations are experimental.

The first approach requires an understanding of how higher-dimensional splines are built from the knot vectors. Put simply: In 1-D, a B-spline is built from one knot vector which defines a grid on the real line. In 2-D, there are two knot vectors which forms a rectangular grid. Generally, in N-D, the knot vectors form a hyper-rectangular grid, so to speak. These grids are regular by construction and the number of grid points grows exponentially as the number of dimensions increase. This limits the conventional (tensor product) B-spline to 5-6 dimensions for all practical purposes.

Now, after the grid and B-spline degree has been decided upon, the remaining step in building a B-spline is to find coefficients so that we get a good match with the data. This involves solving a linear system of equations with the coefficients being the unknowns. The knot vectors decide the number of B-spline coefficients. They also decide the "area of effect" of each coefficient (read about the local support property of splines). Usually, it is a good idea to have at least as many data points as coefficients and to use regularization to prevent overfitting.

If you have scattered points, how would you select the knot vectors? There are some requirements or guidelines:

  1. all data points should lie within the grid (otherwise they will not be fitted to as the B-spline is zero outside of the grid)
  2. the number of grid points should not be much larger than the number of data points
  3. if there is a cluster of data points you may want a grid with high resolution in that area to capture the details in the data

If you put some effort into thinking about this problem, you will quickly find that selecting the grid, when the data points are scattered, is a very difficult problem. For example, the trade-off between 2. and 3. is not trivial.

If you construct a B-spline with the setting BSpline::KnotSpacing::EXPERIMENTAL, the builder will attempt to select good knot vectors from the data points (which may be scattered). The approach combines some logic with a sliding window approach to adapt the resolution of the grid to the data. We cannot guarantee that this works well for all data set (there are some cases in which the current implementation will give poor results). If you want to try this, I strongly suggest that you combine this setting with regularization and that you inspect the generated knot vectors.

The second approach is to use Gradient Boosting with splines. We have a light-weight implementation of this in Python. This algorithm will create a generalized additive model of splines, which is likely to give you better performance than the first approach I sketched above.

It depends on which problem you want to solve and your requirements, but I would generally recommend that you use the second approach (gradient boosting).

Let us know which path you choose to travel down.

Best regards, Bjarne

sdrdis commented 8 years ago

Thank you very much for your complete answer, and sorry for my late answer. I will download the Development branch and try the second approach, and let you know if it works.

bgrimstad commented 8 years ago

Please do. You are the first tester outside of the development team so we would appreciate any feedback.

sjulier commented 6 years ago

We are interested in interpolating with scattered data. Has there been any development / feedback since these posts? I see the documentation still lists using scattered data as experimental and not advised.

bgrimstad commented 6 years ago

Hi @sjulier,

The development on interpolation with scattered data has not been prioritized since I wrote the comment above. As of now, there is a Python example in the bspline-interface branch showing Option 1. I have attached a figure below, showing the result of running the script. We will hopefully be able to prepare the release of v. 4, which enables this feature, in the near future.

May I ask for what purpose you would want to interpolate a B-spline to scattered data? I would disadvise using this method for predictive modeling as it likely will overfit the data. Regularization will help, but there are other methods that are better suited for this use case.

bspline_scattered_data_interpolation