Efficient memory use in FFTs

sfogerty commented 2 years ago

Simple improvements to memory use in the Cabana FFT implementation could have an outsize impact on performance.

1) Get rid of the data copies. Instead of converting between various types of complex data, we could use Kokkos::complex<Scalar>. The issue with this was the alignment in Kokkos::complex, but could be resolved with -DKOKKOS_ENABLE_COMPLEX_ALIGN=ON

2) heFFTe may be allocating a work buffer for each FFT? If so we should pass it a work buffer to use. This could speed things up significantly.

sfogerty commented 2 years ago

@streeve @sslattery I made an issue here to try capturing potential improvements for FFT performance.

streeve commented 2 years ago

With #451 merged can you start on these @sfogerty? Ideally for each performance update you can run the test on each backend and show the improvement. I have a small python script to compare if that's useful

streeve commented 2 years ago

@sfogerty is it relatively straightforward to add the other optimization here?

ECP-copa / Cabana

Efficient memory use in FFTs #470