Closed brandondube closed 5 years ago
As an additional note, the recurrent schemes allow generation of derivatives just as easily, so we will pick this feature up for free as well.
ANSI1TermZernike
and ANSI2TermZernike
are based on recursive Jacobi polynomials. Speed gains are 2-3x for typical orders, but grow larger as the order of the polynomial grows. Fringe benefit is numerical stability to orders at least as high as 1000 in radius. In v0.18 the explicit Zernike implementation will be removed and all Zernikes united under the recursive framework.
Work on :dev/prysm/qpoly.py has lead to the realization that for forbes polynomials, the recursive generation scheme can be O(n) instead of O(n^2) in a naïve implementation.
When compared to the Zernike polynomials, we see that the time constant with numba-jit'd Zernikes is ~6.5x larger than the time constant for the recurrant Q polynomials. QED, we can achieve a speedup of Zernike generation of ~5x by using a recursive generation scheme. This also means that in the numba-free case, our gains will be even larger (order of 20-30x) and we can (perhaps) reduce the size of the Zernike codebase. We can also remove the optional numba dependency, improving import times for users that have numba installed.
The necessary work is to: 1) write a recursive Zernike generation function, taking inspiration from the new qpoly codebase. The two should probably be refactored into a _recursivepoly.py that both import from, since I believe both use Jacobi polynomials and are almost the same. 2) write an
ANSIZernike
class that uses m,n indexing. 3) write maps from Fringe => m, n and the same for Noll 4) write a name generator based on the radial and azimuthal orders 5) adapt theNollZernike
andFringeZernike
to use the recurrant set.At the same time, we should investigate whether storing the basis vectors on the cache in a large mxnxNxN array is feasible (vis-a-vis memory footprint, performance) to use a single multiply and add to do all of the surface generation. In the past, this was less performant than an on-the-fly addition scheme. I suspect the single multiply and add is better on GPU, while the on-the-fly add is better on CPU.