JuliaMolSim / DFTK.jl

Density-functional toolkit
https://docs.dftk.org
MIT License
445 stars 89 forks source link

Use real FFTs #36

Closed antoine-levitt closed 2 years ago

antoine-levitt commented 5 years ago

Add an option to perform real-to-complex and complex-to-real FFTs with a kwarg, and use that to do the potentials as real arrays. Could also do dispatch on the eltype of f_real, but it's not compatible with G_to_r(f_fourier).

antoine-levitt commented 5 years ago

For bonus points, optimize on the kpoints that are real (there's gamma but also some others; see https://docs.abinit.org/theory/wavefunctions/ for a nice writeup) and store wavefunctions in packed mode. That should be relatively easy since the only place we actually use them is in the G_to_r functions: it's just that the basis field of the Kpoint struct only contains half the G vectors. NB: to get a unitary representation (orthogonal basis), a specific treatment is needed: each G component gets a factor sqrt(2) except for the DC one.

mfherbst commented 5 years ago

About the dispatch: I think it's fair to only do this if the in-place version of the FFTs is used or if an appropriate kwarg is used.

mfherbst commented 5 years ago

Related to #9

antoine-levitt commented 4 years ago

So I've been thinking about this, and the main difficulty is the in-place.

First, I've been doing some more serious benchmarking on this: https://gist.github.com/antoine-levitt/3495f47c99676f26824df1fbe9462040. So if I'm interpreting this right, a reasonable mental model is that an alloc, a fill and a copy take about the same time. That unit of time is about 1/8th of an FFT. (some of that might be due to julia issue 130. So allocations do matter, unfortunately (especially when going to eg threading).

The main problem is that we use in-place ffts, which is a lot more tricky with real FFTs, essentially because the input and output don't have the same size (they also don't have the same type, but we can work around that with reinterpret). The real array should be of size 2*(n/2+1) (see eg http://www.fftw.org/fftw3_doc/One_002dDimensional-DFTs-of-Real-Data.html and http://www.fftw.org/fftw3_doc/Multi_002dDimensional-DFTs-of-Real-Data.html#Multi_002dDimensional-DFTs-of-Real-Data). There's a julia interface with an open PR to FFTW but really it's just as simple to do it ourselves). So it means that we can't easily use that and keep the interface we have right now for our functions and in the rest of the code.

So the simplest option is to just drop the in-place FFTs entirely. Does that hurt us very much? In my tests I see a performance difference of about 10% in favour of in-place. So acceptable for the sake of a much cleaner interface.

So, roadmap:

1 loop and thread at the level of Hamiltonian application (rather than FFTs), so FFT functions only ever have to deal with one wavefunction at a time. Ie do for iband=1:nband in mul!(Y,Ham,X). There is no performance penalty in doing so, it simplifies the FFT code quite a bit, improves memory locality and helps us manage memory footprint.

2 do a barebones implementation with out-of-place FFTs keeping the data format for the Psi the same

3 benchmark that implementation. Maybe we need to play some tricks like reuse temporary buffers; we can do that with thread-local buffers (see https://julialang.org/blog/2019/07/multithreading paragraph "thread-local state" for an example; there is also the convenient let buf = ... function f() do_things(buf) end end for function-local variables)

4 switch to a compressed representation for the reciprocal-space representation of the psi. That involves annoying factors of sqrt(2) in both the psi and the nonlocal terms, but we can maybe abstract that away, with something like a on_basis(basis, kpt, gen) where gen(q) is the Fourier transform. That would also simplify some of these pesky sqrt(unit_cell_volume).

antoine-levitt commented 4 years ago

So I've started on this, and hit an unexpected fun thing. I was expecting for the zero kpoint to store only half the G vectors, and have a basis like e_G(r) = e^{iGr} + e^{-iGr} and keep the rest of the code as is, ie expand with complex amplitudes on this basis set. But it doesn't work like that: cG e_G isn't real when cG is complex. So the next idea is to expand like e_G(r) = cG e^{iGr} + cG* e^{-iGr}, but then inner products don't match: <c, c'> != <psi, psi'> (even ignoring the pesky sqrt(2) factors). Fundamentally, real eigenvalue problems of size 2N and complex eigenvalue problems of size N are different. Which explains both

antoine@beta ~/Desktop/qe-6.4.1 $ grep -ir _gamma * | wc -l
933

in QE (important functions have two versions, one for gamma and the others for the other kpoints) and

real(dp) :: cg(2,npw*nspinor*nband)

in ABINIT (complex numbers are not used for the storage of the wavefunctions, and npw gets divided by 2 in the case when k is the gamma point).

Dynamic languages in general and julia in particular really shine here. We just have to be careful to type Psi as a Vector{Any}, so that Psi[1] can be a matrix of reals and Psi[2] a matrix of complex.

mfherbst commented 2 years ago

Done in #722.