Generated functions are slow

RJDennis commented 7 years ago

The generated functions for evaluating the weights in the Chebyshev polynomials take twice as long to run as the regular functions, but use about the same amount of memory. This may be due to the use of Arrays of Arrays to store the polynomial matrices?

stevengj commented 4 years ago

I put together an experimental package to implement some performance improvements:

https://github.com/stevengj/FastChebInterp.jl

To compute the Chebyshev weights, it uses an FFT-based algorithm. And to evaluate the interpolant and its derivatives I found a simple allocation-free way to implement the Clenshaw algorithm recursively that seems to outperform ChebyshevApprox.jl by several times on our test functions. It also supports vector-valued functions efficiently via StaticArrays.jl, and implements a callable-object API (similar to Interpolations.jl) to be somewhat friendlier.

Would you be interested in adopting some of this code?

RJDennis commented 4 years ago

Thanks for letting me know. I'm happy for ChebyshevApprox to be faster and more useful to more people, so I would be interested to adopting some of your improvements. What's the best way to proceed?

stevengj commented 4 years ago

I can submit a PR, but it depends on what you want the API to look like. e.g. you want to adopt the callable-object API (similar to Interpolations.jl)?

Obviously, we could just drop all of FastChebInterp.jl into this package, adding the new ChebInterp callable-object, but it would be nice to decide on a unified API first.

There is also the question of whether you can adapt my polynomial-evaluation routines for your other Chebyshev interpolations. Mainly, this is just a matter of ensuring that the data structures are the same — I need an N-dimensional array of Chebyshev coefficients (representing a tensor product of Chebyshev polynomials).

(My students tell me that the Chebyshev points I used are slightly different from yours? Mine correspond to first-kind Chebyshev polynomials evaluated by a type-I discrete cosine transform similar to that used in Clenshaw–Curtis quadrature, but I could switch to a type-II discrete cosine transform which is I think what you may have used? The convergence rates should be essentially the same; they differ mainly in whether the endpoints of the domain are evaluated, essentially whether the points are offset by half.)

RJDennis / ChebyshevApprox.jl

Generated functions are slow #3