Open sfogerty opened 2 years ago
@streeve @sslattery I made an issue here to try capturing potential improvements for FFT performance.
With #451 merged can you start on these @sfogerty? Ideally for each performance update you can run the test on each backend and show the improvement. I have a small python script to compare if that's useful
@sfogerty is it relatively straightforward to add the other optimization here?
Simple improvements to memory use in the Cabana FFT implementation could have an outsize impact on performance.
1) Get rid of the data copies. Instead of converting between various types of complex data, we could use
Kokkos::complex<Scalar>
. The issue with this was the alignment inKokkos::complex
, but could be resolved with-DKOKKOS_ENABLE_COMPLEX_ALIGN=ON
2) heFFTe may be allocating a work buffer for each FFT? If so we should pass it a work buffer to use. This could speed things up significantly.