Use Toeplitz-dot-Hankel for very large dimensional transforms

dlfivefifty commented 1 year ago

Toeplitz-dot-Hankel has better complexity plans so should be the default when arrays are large: Eg for cheb2leg even n = 1000 sees a signifcant speedup:

julia> n = 100; x = randn(n); @time cheb2leg(x); @time th_cheb2leg(x);
  0.000267 seconds (2 allocations: 944 bytes)
  0.000654 seconds (98 allocations: 288.531 KiB)

julia> n = 1000; x = randn(n); @time cheb2leg(x); @time th_cheb2leg(x);
  0.028686 seconds (2 allocations: 7.984 KiB)
  0.006464 seconds (99 allocations: 10.559 MiB)

julia> n = 1000; x = randn(n); @time cheb2leg(x); @time th_cheb2leg(x);
  0.028856 seconds (2 allocations: 7.984 KiB)
  0.011597 seconds (99 allocations: 10.559 MiB)

julia> n = 10_000; x = randn(n); @time cheb2leg(x); @time th_cheb2leg(x);
  0.778423 seconds (3 allocations: 78.219 KiB)
  0.103821 seconds (108 allocations: 799.524 MiB)

This was prematurely changed but reverted as there were some regressions. But also the number of allocations in th_* is exorbitant, probably because it dates back to a port of Matlab code.

For matrices I'm not seeing much improvement in the th_*, even for a 40k x 10 transform, which is suspicious....but doing a profile all the time is in the FFTs

MikaelSlevinsky commented 1 year ago

It's true there has been a significant regression in the lib plans (a couple reasons for this, not all justified).

MikaelSlevinsky commented 1 year ago

There's a library method size_t X(summary_size_tb_eigen_FMM)(X(tb_eigen_FMM) * F); to get the number of bytes of the plans. The ccall isn't even setup yet, but may be interesting for comparison's sake re: allocations.

jishnub commented 1 year ago

Most of the allocation seems to come from https://github.com/JuliaApproximation/FastTransforms.jl/blob/daafeb3e0eb2ccff8e4a4d5869ef5cfe589ebfe3/src/toeplitzhankel.jl#L111 which allocates an n x n matrix, but discards most of the columns in https://github.com/JuliaApproximation/FastTransforms.jl/blob/daafeb3e0eb2ccff8e4a4d5869ef5cfe589ebfe3/src/toeplitzhankel.jl#L131 This may be re-thought. E.g. in the n = 10_000 example, only 39 columns are actually eventually retained.

dlfivefifty commented 1 year ago

ah thanks, that should be an easy fix, we can just change it to 100 columns for now (we know the number of columns grows logarithmically so this probably will never be reached)

JuliaApproximation / FastTransforms.jl

Use Toeplitz-dot-Hankel for very large dimensional transforms #209