ecurtiss commented 2 months ago

Using this issue to document my findings on using Chebyshev interpolation to speed up arc length reparametrizations. My target audience is my future self.

For previous work on arc length parametrizations, see https://github.com/ecurtiss/CatRom/issues/3.

ecurtiss commented 2 months ago

CatRom needs to support one-off reparamatrizations as well as bulk reparametrizations. Newton's method still performs best for one-off reparametrizations, but I have tried a variety of methods for bulk reparametrizations.

1. Newton's method

This is the existing reparametrization method. It uses a hybrid of Newton's method and bisection method. This method has no precompute, so it initially outperforms the rest of the methods, but it eventually gets surpassed given enough calls.

2. Chebyshev polynomial (Newton's method)

This uses a Chebyshev polynomial to interpolate the arc length function. The cheb is inverted by sampling points on its inverse via Newton-bisection and constructing a second cheb from the samples. For Newton, I am using the derivative of the original function, not the cheb. Finally, once the inverted cheb has been computed, reparametrizing amounts to evaluating the polynomial.

3. Chebyshev polynomial (regula falsi)

Same as option 2 but uses regula falsi to invert the cheb.

4. Chebyshev polynomial (ITP)

Same as option 2 but uses the ITP method to invert the cheb.

5. Uniform lookup table (Newton's method)

This is the existing PrecomputeArcLengthParams() method. It uses option 1 to precompute a lookup table that maps arc lengths uniformly spaced in [0, 1] to their times. Once the lookup table has been computed, the values are linearly interpolated.

6. Uniform lookup table (Chebyshev polynomial)

Precomputes a lookup table with uniformly-spaced points like option 5, but we interpolate the arc length function with a cheb and then sample points on its via regula falsi.

7. Non-uniform lookup table computed via Chebyshev polynomial

Precomputes a lookup table with non-uniformly spaced points. In particular, we use option 3 and then recycle the values of the inverted cheb on the Chebyshev grid as the lookup table. Since the Chebyshev grid clusters points near 0 and 1, the resulting lookup table has greater accuracy near the boundary, which is likely where a spline will have its greatest curvature (provided default constructor arguments are used).

ecurtiss commented 2 months ago

Benchmark 1: Reparamatrization

This benchmark measures the time it takes to reparametrize an arc length after all precomputes have been performed. I created a CatRom with 50 splines and reparametrized 50 arc lengths per spline.

Here, the Chebyshev polynomials interpolated 8 grid points, and the lookup tables had 8 samples. We see that the lookup tables are about an order of magnitude faster than evaluating a cheb, which is an order of magnitude faster than Newton-bisection.

Here, I increased the cheb grid points to 16 and the LUT samples to 16. As expected, only the cheb evaluation got slower, but it is still very fast (< 1us).

ecurtiss commented 2 months ago

Benchmark 2: Precomputes

This benchmark measures how long the precomputes take for options 2-7. I ran the precomputes on 20 CatRoms, each having 20 splines.

Here, the chebs had 8 grid points and the LUTs had 8 samples.

Here, the chebs had 16 grid points and the LUTs had 16 samples. I had to remove 4. Cheb (ITP) for this test because it would have taken too much time.

3, 6, and 7 all sample a cheb's inverse via regula falsi, so this is clearly the fastest precompute. The choice between them will come down to accuracy.

ecurtiss commented 2 months ago

Benchmark 3: Precompute + Reparametrizations

This benchmark measures the full performance of precomputing (if necessary) and then bulk reparametrizing. I created one CatRom with 50 splines, precomputed, and then did some number of reparametrizations per spline.

5 reparametrizations per spline, 8 grid points per cheb, 8 samples per LUT

10 reparametrizations per spline, 8 grid points per cheb, 8 samples per LUT

20 reparametrizations per spline, 8 grid points per cheb, 8 samples per LUT

5 reparametrizations per spline, 16 grid points per cheb, 16 samples per LUT

10 reparametrizations per spline, 16 grid points per cheb, 16 samples per LUT

20 reparametrizations per spline, 16 grid points per cheb, 16 samples per LUT

With a little more testing, the tipping point where it becomes worth your time to precompute a lookup table is 8 samples per spline given 8 grid points/samples and 15 samples per spline given 16 grid points/samples.

ecurtiss commented 2 months ago

Accuracy

It remains to measure each method's accuracy. To do this, I created a CatRom with 20 splines and performed 100 reparametrizations per spline. Then, using Newton-bisection as my source of truth, I charted the absolute error of some of the well-performing methods. Additionally, to extract a single error value, I also printed the sum of the absolute errors for each (L1 norm).

8 grid points per cheb, 8 samples per LUT

3. Cheb (regula falsi) 2.6796691331763727
5. LUT (Newton) 7.931484687275745
6. LUT (Cheb uniform) 7.973120910351782
7. LUT (Cheb non-uniform) 4.6878766571400385

16 grid points per cheb, 16 samples per LUT

3. Cheb (regula falsi) 0.2661398687654118
5. LUT (Newton) 2.377147364435644
6. LUT (Cheb uniform) 2.421018434349442
8. 7. LUT (Cheb non-uniform) 1.2552987303827043

We see that evaluating a cheb is most accurate, followed by a lookup table on the cheb grid, followed by uniformly-spaced lookup tables. Interestingly, both lookup tables with uniformly-spaced samples have the exact same accuracy.

ecurtiss commented 2 months ago

Conclusion

We learned from the benchmarks that

Doing linear interpolation on a lookup table is an order of magnitude faster than evaluating a cheb.
Inverting a cheb via regula falsi is faster than Newton-bisection and ITP.
Computing a lookup table is faster using a cheb than Newton-bisection.
You need to do about one reparametrization per spline per cheb grid point/LUT sample to make a precompute worthwhile.

As a result, we can axe options 2, 4, and 5, leaving 3, 6, and 7.

In terms of speed: 6 > 7 >> 3 In terms of accuracy: 3 >> 7 > 6.

However, I strongly believe that the reparametrization time on 7 can be made comparable to 6 by replacing a linear search with a binary search. Therefore, I am considering 6 = 7 in terms of speed, letting us rule out option 6. Finally, we have our conclusion:

For one-off reparametrizations, use Newton
For bulk reparametrizations favoring speed, build a lookup table from a cheb via regula falsi
For bulk reparametrizations favoring accuracy, evaluate a cheb that was inverted via regula falsi
You should precompute if you are doing more reparametrizations per spline than the degree of the cheb

catrom-reparametrization-benchmarks.zip

ecurtiss commented 2 months ago

Implemented in a6c9017ea0c3f13e43a5ae407db718d107e77b1f. I tried to err on the side of being verbose for the API, but I'm not convinced that this will be the final version.

ecurtiss / CatRom

Improve arc length reparametrization #18

1. Newton's method

2. Chebyshev polynomial (Newton's method)

3. Chebyshev polynomial (regula falsi)

4. Chebyshev polynomial (ITP)

5. Uniform lookup table (Newton's method)

6. Uniform lookup table (Chebyshev polynomial)

7. Non-uniform lookup table computed via Chebyshev polynomial

Benchmark 1: Reparamatrization

Benchmark 2: Precomputes

Benchmark 3: Precompute + Reparametrizations

Accuracy

Conclusion