MikaelSlevinsky / FastTransforms

:bullettrain_front: Fast orthogonal polynomial transforms :surfer:
MIT License
58 stars 9 forks source link

2D cheb2leg (or more generally, support "region") #76

Open dlfivefifty opened 1 year ago

dlfivefifty commented 1 year ago

FFTW allows specifying "regions", for multidimensional FFTs:

julia> X = randn(5,5,5);

julia> F = fft(X,(1,3));

julia> F[:,1,:] ≈ fft(X[:,1,:]) # 2D FFT acting on 1st and 3rd dimension
true

It turns out I need this feature for Legendre transforms.... at the moment it just does 1D transforms:

julia> X = randn(5,5);

julia> cheb2leg(X)[:,1] == cheb2leg(X[:,1])
true

I can add it to FastTransforms.jl, e.g., if I need a 2D Legendre I can do

cheb2leg(cheb2leg(X')')

but I'm curious if this could be SIMD-optimised (or multithreaded) in C to make it faster?

MikaelSlevinsky commented 1 year ago

This would certainly be helpful to include in the interface, but I'm not sure that execution would end up being any faster than cheb2leg(cheb2leg(X')') apart from using plans and a temp array for the transpose.

The current plans are already OpenMP parallelized, so that explains why multi-column execution is maybe faster than expected (this is slightly less than half the 2D transform):

julia> n = 10_000; x = randn(n); p = plan_leg2cheb(Float64, n); lmul!(p, ldiv!(p, x)); @time lmul!(p, x);
  0.007975 seconds

julia> X = randn(n, n);

julia> lmul!(p, ldiv!(p, X)); @time lmul!(p, X);
  5.521245 seconds

julia> n*0.007975/5.52 # close to 18 -- my core count
14.447463768115943
MikaelSlevinsky commented 1 year ago

Ideally, we'd want a "cache-oblivious transpose." On a single machine, is this how FFTW does it? https://github.com/FFTW/fftw3/blob/e9c510bf92a4fa848982310d2ecc8c5701aa3f6a/kernel/transpose.c