Using TRS only to speed up FFTs, keeping the data representation the same
Using a minimal compressed representation
The issue with 1 is we have functions like G_to_r!(f_real, f_fourier) where f_real is complex 3D, and f_fourier is complex 1D. This function puts f_fourier into f_real, and then in-place transforms it. If we want to keep this structure (and not have an extra array) we need to use a rfft!, which is not implemented directly by FFTW.jl. There's https://discourse.julialang.org/t/in-place-rfft-with-padded-arrays/3340/4 which suggests a way out. The key part is calling FFTW.rFFTWPlan to access the lower-level interface directly. Together with some low-level reinterpret business this should be OK (we should put these ad-hoc things in r_to_G!_lowlevel functions that would only be used by DftHamiltonian)
Then, for 2, the minimal compressed representation should be done in a way that preserves the inner product. That means sqrt(2) and DC component shenanigans. It also means that not every randn() is valid as an orbital (because the DC component should be real), so things like the randn in LOBPCG have to be modifed. The gain/annoyance to ratio is pretty low for this one, so we should probably don't do it.
There are two parts to this.
Using TRS only to speed up FFTs, keeping the data representation the same
Using a minimal compressed representation
The issue with 1 is we have functions like G_to_r!(f_real, f_fourier) where f_real is complex 3D, and f_fourier is complex 1D. This function puts f_fourier into f_real, and then in-place transforms it. If we want to keep this structure (and not have an extra array) we need to use a rfft!, which is not implemented directly by FFTW.jl. There's https://discourse.julialang.org/t/in-place-rfft-with-padded-arrays/3340/4 which suggests a way out. The key part is calling FFTW.rFFTWPlan to access the lower-level interface directly. Together with some low-level
reinterpret
business this should be OK (we should put these ad-hoc things inr_to_G!_lowlevel
functions that would only be used by DftHamiltonian)Then, for 2, the minimal compressed representation should be done in a way that preserves the inner product. That means sqrt(2) and DC component shenanigans. It also means that not every randn() is valid as an orbital (because the DC component should be real), so things like the randn in LOBPCG have to be modifed. The gain/annoyance to ratio is pretty low for this one, so we should probably don't do it.