Test on forces derivatives is failing unexpectedly

gkemlin commented 2 years ago

In the recent CI from the previous PR's (see eg here), there was some bug happening with the derivative of forces for metal which I wanted to investigate, but this was merged automatically. In the test test/forwardiff.jl, we use rHF silicon with temperature for this test and it happens to fail sometimes, always for this specific case and not any other test. At first I thought this was an error from χ0, but actually the error seems to happen before. For instance, I've tweaked a bit the configuration from test/forwardiff.jl in the code below, and this results in a error from symmetrize_ρ https://github.com/JuliaMolSim/DFTK.jl/blob/17c64890a38b17fc8b57abe5306715aee56e9127/src/response/chi0.jl#L345 where the assertion from https://github.com/JuliaMolSim/DFTK.jl/blob/17c64890a38b17fc8b57abe5306715aee56e9127/src/fft.jl#L69 is not passed.

Any ideas what is happening here ? This really only happens for the rHF model, so maybe there is something we miss here but I can't manage to understand what...

using DFTK
using ForwardDiff
using Random

a = 10.26  # Silicon lattice constant in Bohr
lattice = a / 2 * [[0 1 1.];
                   [1 0 1.];
                   [1 1 0.]]
Si = ElementPsp(:Si, psp=load_psp("hgh/lda/Si-q4"))
atoms     = [Si, Si]
positions = [[1.02, 0.97, 1.04] / 8, -ones(3) / 8]
temperature = 1e-3

function compute_force(ε1, ε2)
    T = promote_type(typeof(ε1), typeof(ε2))
    # solve rHF for Silicon (metallic)
    pos = positions .+ [ zeros(3), ε1 * [1., 0, 0] + ε2 * [0, 1., 0]]
    model = model_DFT(Matrix{T}(lattice), atoms, pos, []; temperature)
    basis = PlaneWaveBasis(model; Ecut=5, kgrid=[2, 2, 2], kshift=[0, 0, 0])
    is_converged = DFTK.ScfConvergenceDensity(1e-10)
    scfres = self_consistent_field(basis; is_converged,
                                   response=ResponseOptions(verbose=true))
    # return forces
    compute_forces_cart(scfres)
end

derivative_ε1_fd = let ε1 = 1e-5
    (compute_force(ε1, 0.0) - compute_force(0.0, 0.0)) / ε1
end
derivative_ε1 = ForwardDiff.derivative(ε1 -> compute_force(ε1, 0.0), 0.0)
@show norm(derivative_ε1 - derivative_ε1_fd)

antoine-levitt commented 2 years ago

I hate symmetriiiiiiiiiiiiiiiiies

mfherbst commented 2 years ago

This really only happens for the rHF model

You're sure this is not because of this being a metal?

Edit: This is really because of the way we setup up this particular structure and problem, see below.

mfherbst commented 2 years ago

I found that a different number of irreducible k-points is used between some of the calculations (8 in the displaced finite-diff, 6 in the displaced and 6 in the forward-diff run). That in combination with the small Ecut and kgrid used, might explain it ... especially given that we model a metal here.

Edit: Ok that's not the full story. Even with a larger kgrid and Ecut it gives differences. Increasing temperature or changing the blowup function also does not help.

gkemlin commented 2 years ago

Yes, indeed. Even symmetries=false in the model gives different results.

You're sure this is not because of this being a metal?

I've tried with magnesium and I don't manage to reproduce this.

antoine-levitt commented 2 years ago

Indeed, the symmetrization of the forces appears to be wrong here, the forces at (0,0) are different with and without symmetries. MWE:

using DFTK
using ForwardDiff
using Random

a = 10.26  # Silicon lattice constant in Bohr
lattice = a / 2 * [[0 1 1.];
                   [1 0 1.];
                   [1 1 0.]]
Si = ElementPsp(:Si, psp=load_psp("hgh/lda/Si-q4"))
atoms     = [Si, Si]
positions = [[1.02, 0.97, 1.04] / 8, -ones(3) / 8]
temperature = 1e-3

for sym in (true, false)
    ε1 = ε2 = 0.0
    # solve rHF for Silicon (metallic)
    pos = positions .+ [ zeros(3), ε1 * [1., 0, 0] + ε2 * [0, 1., 0]]
    model = model_DFT(lattice, atoms, pos, []; temperature, symmetries=sym)
    basis = PlaneWaveBasis(model; Ecut=5, kgrid=[2, 2, 2], kshift=[0, 0, 0])
    is_converged = DFTK.ScfConvergenceDensity(1e-10)
    scfres = self_consistent_field(basis; is_converged,
                                   response=ResponseOptions(verbose=true))
    scfres.energies
    # return forces
    display(compute_forces_cart(scfres))
end

I'll try to take alook...

antoine-levitt commented 2 years ago

This appears to be a case of accidental symmetries: changing the numbers in the positions somewhat results in less symmetries. Did I mention I hated symmetries?

antoine-levitt commented 2 years ago

With forces_nonsym the forces obtained from a symmetries=false computation:

julia> DFTK.symmetrize_forces(model, forces_nonsym; symmetries=model.symmetries) - forces_nonsym
2-element Vector{StaticArraysCore.SVector{3, Float64}}:
 [-8.005238182409724e-13, 0.004153669530948306, 5.807881302624662e-13]
 [-8.005238182409724e-13, -0.004153669533415452, 5.807881302624662e-13]

(and the same for the basis.symmetries, so that's not the issue). So, assuming our nonsymmetric forces computation is correct (which I think it is), either our symmetry detection or our forces symmetrization routine is wrong...

antoine-levitt commented 2 years ago

I'm a bit stumped there: the symmetry detection should be robust (we get them from spglib, and we recheck them independently), and we debugged the forces symmetrization routine, which agrees with papers. The symmetries do form a group. I remember something about @jaemolihm saying we should discard "accidental" or weird symmetries (meaning symmetries with a w which is not rational or something), but I don't remember what the point was, @jaemolihm do you remember?

jaemolihm commented 2 years ago

I ran the MWE, I think the issue is different, because sym and nosym results differ even when explicitly setting symmetries to those with simple w.

I find that only the DFTK.TermAtomicNonlocal term contribution is different, so an issue might be there

julia> basis.terms .|> typeof
8-element Vector{DataType}:
 DFTK.TermKinetic
 DFTK.TermAtomicLocal{Array{Float64, 3}}
 DFTK.TermAtomicNonlocal
 DFTK.TermEwald{Float64}
 DFTK.TermPspCorrection{Float64}
 DFTK.TermHartree
 DFTK.TermNoop
 DFTK.TermEntropy

# sym true
forces_per_term = Union{Nothing, Vector{StaticArrays.SVector{3, Float64}}}
nothing
StaticArrays.SVector{3, Float64}[[0.07038869844203988, -2.886579864025407e-15, 0.10032645829629105], [-0.07038869850082263, -2.220446049250313e-15, -0.10032645830509423]]
StaticArrays.SVector{3, Float64}[[-0.1504997004907923, 0.1603602195724699, -0.17022073865414747], [0.15049970053068973, -0.1603602196038185, 0.17022073867694731]]
StaticArrays.SVector{3, Float64}[[-0.04033939191779377, 9.57637842253505e-16, -0.0578011857046174], [0.04033939191779377, -9.576513873907308e-16, 0.0578011857046174]],
nothing, nothing, nothing, nothing]

# sym false
 forces_per_term = Union{Nothing, Vector{StaticArrays.SVector{3, Float64}}}[
 nothing,
 StaticArrays.SVector{3, Float64}[[0.07038869827965999, -4.0044856319809696e-11, 0.1003264579835943], [-0.07038869827688266, -6.477041125663163e-13, -0.10032645800503182]],
 StaticArrays.SVector{3, Float64}[[-0.04773509506391771, 2.6638774519582853e-11, -0.06745613320749312], [0.04773509506303912, -4.547834331347644e-12, 0.06745613321915436]],
 StaticArrays.SVector{3, Float64}[[-0.04033939191779377, 9.57637842253505e-16, -0.0578011857046174], [0.04033939191779377, -9.576513873907308e-16, 0.0578011857046174]],
 nothing, nothing, nothing, nothing]

jaemolihm commented 2 years ago

I find that only the DFTK.TermAtomicNonlocal term contribution is different, so an issue might be there

This is because only that term uses symmetrize_forces. The unsymmetrized forces satisfy f = W' * f, not f = W * f, while the symmetrization assues the latter. Maybe another transpose issue..

https://github.com/JuliaMolSim/DFTK.jl/blob/master/src/symmetry.jl#L246

antoine-levitt commented 2 years ago

Nice detective work! We do make sure that the W is the right real space rotation (spglib.jl:118), so that part should be OK. The formula we have right now seems consistent with (A.27) of https://arxiv.org/pdf/0906.2569.pdf, but probably there's a subtlety, I'll try to rederive it.

jaemolihm commented 2 years ago

Force means $E(\vec{\delta x}) = \vec{\delta x} \cdot \vec{F}$. Under rotation, $E(W \vec{\delta x}) = (W \vec{\delta x}) \cdot \vec{F}$. Symmetry means $E(\vec{\delta x}) = E(W \vec{\delta x})$. Thus, we have $\vec{\delta x} \cdot \vec{F} = (W \vec{\delta x}) \cdot \vec{F}$. Thus $F_i = Fj W{ji}$ (in Cartesian coordinates).

antoine-levitt commented 2 years ago

I agree with this, and it makes sense since forces are covectors. (A.27) says differently, but maybe it's valid in cartesian but not in reduced or something

jaemolihm commented 2 years ago

Maybe R in the QE paper is S = W', not W?

antoine-levitt commented 2 years ago

This also works

            f = forces[group[i_other_at]]
            sym_f = covector_cart_to_red(model, matrix_red_to_cart(model, W) * covector_red_to_cart(model, f))
            symmetrized_forces[idx] += sym_f

This does sym_f_red = A^-T A Wred A^-1 A^T Fred, where A is the lattice. It doesn't seem to me that it's equivalent to doing Wred^T Fred, even using the fact that Wcart is orthogonal.

antoine-levitt commented 2 years ago

Maybe R in the QE paper is S = W', not W?

I don't think so, since they write that for every position r, R(r+f) should be an atom of the same type as r, and we check that Wr+w is of the same type as r, so the conventions should be equivalent up to w=Rf (which we do take into account correctly, I hope...)

antoine-levitt commented 2 years ago

This does sym_f_red = A^-T A Wred A^-1 A^T Fred

Another way of writing this is sym_f_red = inv(Wred') f_red. This works also in this example. Your reasoning above says F = W' F, but W' = inv(W), so that F = inv(W) F, and by multiplying by W, F = W F, so it's also consistent with that.

mfherbst commented 2 years ago

I did some more tests with the fix proposed in #744 (which is definitely the right direction to follow) and I think the issue with the failing tests has to do with the way we treat symmetries in the response. Because now the value part in the forces seems very reasonable also in the forwarddiff calculation, but the dual part does not look right, e.g. using Gaspard's script

derivative_ε1_fd = [[-0.010188772141717269, 0.5540641933043239, 0.5503864129594938], [0.010187937546130184, -0.5540640837126478, -0.5503858548851172]]
derivative_ε1 = [[1.8376265562014128e6, 5.507626029595309e6, -1.837625369839999e6], [-1.8376253613343276e6, -5.50762642314619e6, 1.8376267549620424e6]]

but when I run it again:

derivative_ε1_fd = [[-0.01018562048691208, 0.5540706243039802, 0.5503838137466783], [0.010186103892827203, -0.5540667950994921, -0.5503840477318837]]
derivative_ε1 = [[0.576337577783623, 0.6292566906509841, 0.6100238361659679], [0.624274131079789, -1.0219749965960532, 0.7693535838844899]]

and without symmetries:

derivative_ε1_fd = [[-0.0101852412196673, 0.5540697222225325, 0.5503834867633052], [0.010185991221832707, -0.5540706391081104, -0.5503840552398427]]
derivative_ε1 = [[-0.010202755281099029, 0.5540729798449112, 0.5503845518212458], [0.010202757408728927, -0.5540730086009197, -0.5503845604364989]]

I lack the time for today to dig further, but at least a step.

mfherbst commented 2 years ago

Looks terribly also like some uninitialised value or something like that.

mfherbst commented 2 years ago

... or the perturbation breaks one of the symmetries and we filter that out during the response calculation?

antoine-levitt commented 2 years ago

Haven't looked closely but plausibly the dV is zero and we normalize it in chi0 for numerical purposes, which breaks things somehow?

gkemlin commented 2 years ago

Ok, so I'm trying to summarize what happens :smile: If I'm not mistaking, we are currently facing two issues :

the first one is the symmetry thing spotted by the script I showed at the top of this thread, see also Michael's post https://github.com/JuliaMolSim/DFTK.jl/issues/742#issuecomment-1268479029, which works if symmetries are disabled but doesn't work otherwise
the second one is the crazy gmres spotted in https://github.com/JuliaMolSim/DFTK.jl/pull/744 where we get too big values in ForwardDiff.

Starting from there, I think have spotted the point 2) bug from https://github.com/JuliaMolSim/DFTK.jl/pull/744. Sorry in advance for the long message! Actually I'm not sure it comes from symmetry but rather from some badly initialized value, as already mentioned. When we solve the Dyson equation here https://github.com/JuliaMolSim/DFTK.jl/blob/353c11f1871adffae8bcbad9aadb95faa929f718/src/response/hessian.jl#L161 δρ0 already has huge norm. This comes from the first application of apply_χ0_4P, where one of the solves of the Sternheimer fails. The Sternheimer tolerance is set to tol/10 where tol in ForwardDiff is scfres.norm_Δρ. That makes tol_sternheimer to 4.12e-12 and the behavior of the Sternheimer is crazy for one band of the gamma point, where the cg gets stuck for 40 iteration at 1e-11 before going up and down again, resulting in a δψk[:,n] with huge norm (see verbose in this gist). I'm not sure I understand why this particular band get stuck with the CG at 1e-11, but here are some tracks to follow.

if we set tol_sternheimer to tol instead of tol/10 here https://github.com/JuliaMolSim/DFTK.jl/blob/353c11f1871adffae8bcbad9aadb95faa929f718/src/response/hessian.jl#L129 then the test passes.
if we don't use extra bands, it works. This is rather bad news because we are supposed to exactly avoid this type of things with the extra bands :laughing: .
this comes from the diagonal matrix Diagonal(εk_extra .- εk[N]) to have small values: for the band for which the Sternheimer fails, the first value on the diagonal is εk[N+1] - εk[N] = 0.00107679, and both eigenvalues are converged enough, with occk[N]>1e-6 and occk[N+1]~1e-7. I would say that the problem is not the Schur (because it works perfectly without the extra bands, and we know it has a worse conditioning number then), but rather the way the inverse of Diagonal(εk_extra .- εk[N]) is applied, which accumulates bigger and bigger numbers during the stagnation of the cg. @antoine-levitt any ideas on how to improve this part to avoid this ?
if we set the default occupation to 1e-8 rather than 1e-6, everything works fine.

This does not solve the symmetry issues, but at least we now know where this crazy gmres comes from :upside_down_face:

mfherbst commented 2 years ago

Regarding the large norm in apply_\chi0 we already do something against this, namely here: https://github.com/JuliaMolSim/DFTK.jl/blob/16c1ef1e214e01ce65a7f67429a0ab31352a84f9/src/response/chi0.jl#L351-L353

It should not be a problem to do something similar in solve_ΩplusK_split as well (or in the forward-diff rule calling it) and I'd not be surprised if this fixes all the issues.

I think most of what you describe is basically treating the symptom: If your input δρ is huge the solver has a tough time to converge to 1e-12 that makes a lot of sense. At some point it might get stuck in some numerical issues because of the range the data types need to represent and then depending on how you tweak a few things you might get around it or not. Symmetry can be one of these tweak parameters.

gkemlin commented 2 years ago

I'm not sure I see what you mean : If the δρ0 out of the first apply_χ0_4P is completely screwd up, how can we deal with it ? If we divide δρ0 by it norm and then re-multiply with it, this won't solve the issue ?

I think most of what you describe is basically treating the symptom: If your input δρ is huge the solver has a tough time to converge to 1e-12 that makes a lot of sense. At some point it might get stuck in some numerical issues because of the range the data types need to represent and then depending on how you tweak a few things you might get around it or not. Symmetry can be one of these tweak parameters.

Yes, I agree with that.

mfherbst commented 2 years ago

I'm not sure I see what you mean : If the δρ0 out of the first apply_χ0_4P is completely screwd up, how can we deal with it ?

You deal with it even before ... by scaling the RHS, i.e. the δHψ. That should work, no?

gkemlin commented 2 years ago

I've normalized the rhs of every sternheimer (ie δHψkn is normalized) and now the CG stagnates at 6e-12 instead of 1e-11 ! The tolerance we're asking from ForwardDiff is 5e-12... I suggest two possibilities:

either we agree that it's only numerical issues that are unlikely to happen anywhere else than in our test and we can take 1e-9 for the convergence of the SCF in the test instead of 1e-10, so that ForwardDiff asks for higher tolerance. This seems ok for me because asking for tolerance 1e-10 on the density means that the energy is already at machine precision.
either we try to adapt the tolerance of the Sternheimer: this stagnation is likely to occur for the highest occupied band, with occupation number of order 1e-6 by default, so we can set the tolerance in the cg to abstol/sqrt(occ[ik][n]) instead. This won't cause any issue because we multiply by occ[ik][n] afterwards, but this allows for a bit more flexibility on the sternheimer for the last occupied bands.

Both solutions work and seem good to me, I'll be happy to have your opinion on that !

antoine-levitt commented 2 years ago

The trace you linked with the residual going down and then up and down is very characteristic of "oversolving": when you solve iteratively Ax=b where A has a nullspace (which is not exactly zero), and you are asking for really tight tolerances, at some point the iterative solver is going to do figure out that there is something nonzero in the nullspace, and it's going to want to invert that (with catastrophic results, of course). We should try to figure out why that happens, let's discuss it sometime. Regarding solutions, projecting out the nullspace might not work, because the nullspace does not necessarily numerically commute with A. I'm not aware of a general solution to the problem, but the QE technique of shifting it away from zero might work...

mfherbst commented 2 years ago

Maybe the idea of adapting the Sternheimer tolerance depending on the occupation might at least be a way out. Indeed I would expect more issues close to the Fermi level (e.g. rotations in the p or d orbitals, which could be such a nullspace). I agree we should investigate this more thoroughly, so I would stay very conservative for now. abstol/sqrt(occ[ik][n]) might be a bit too loose though. We should in any case investigate this on some non-trivial examples before introducing it.

antoine-levitt commented 2 years ago

That's probably not the nullspace that matters, I meant simply the one from the projection onto the occupied bands. If these are converged to some eps and you solve the sternheimer to less than eps, you're in trouble. It'd be nice to find a proper way to solve this once and for all and not rely on heuristics

mfherbst commented 2 years ago

I agree, but perhaps for now we should just ensure in https://github.com/JuliaMolSim/DFTK.jl/blob/master/src/response/hessian.jl#L129 to not overdo it as @gkemlin suggested. I actually did some experiments in #737 and found that tol_sternheimer=tol is ok in the initial and final apply_χ0_4P and only in the inner apply_χ0 I took the minimal tolerance to be tol / 10. I assume that with the rescaling @gkemlin tried should stabilise the solver sufficiently for now.

Other thoughs?

gkemlin commented 2 years ago

Well, I'm not sure this would prevent the over-solving issue ? This can still happen in the inner apply_χ0 because we just ask the sternheimer solver to converge below a tolerance it can not always reach. We've discussed that with @antoine-levitt yesterday, and, actually, asking the sternheimer to converge below the residual of the eigenvectors defining the occupied states is not reasonable and currently, scfres.norm_Δρ/10 can be too much below this limiting tolerance. I've thus changed the tol_sternheimer from ForwardDiff rules to the maximum of the eigenvector residuals from the scfres in this PR https://github.com/JuliaMolSim/DFTK.jl/pull/746, but maybe using directly scfres.norm_Δρas the sternheimer tolerance could do the work too.

mfherbst commented 2 years ago

I only used scfres.norm_Δρ as a measure for the accuracy of the SCF ground state. You can certainly have that this value is lower than the residual of the eigenbasis, so from that point of view it seems reasonable to avoid that measure to determine the tol and base it on the maximal residual of the occupied instead.

mfherbst commented 2 years ago

Regarding the interplay of tol and tol_sternheimer, how about the following:

The tightest solve we allow for (and what we do by default in the forwarddiff) is to have tol = max_residual and tol_sternheimer=tol.
If someone selects a larger tol (cruder solve) then we have tol_sternheimer=tol/10, because in my test just selecting tol_sternheimer=tol was fine for the the initial and final apply_χ0_4P, but not for the nested apply_χ0 inside the GMRES due to accumulation of numerical error.
Overall that would suggest a tol_sternheimer=max(maximal_residual, tol/10)

But regarding the last point, I'm happy to be convinced otherwise. I.e. if you can do some serious response calculations that show that setting tol_sternheimer=tol/10 is not needed, than I'm ok with having tol_sternheimer=tol by default (but for now I would keep both kwargs).

gkemlin commented 2 years ago

tol_sternheimer=max(maximal_residual, tol/10) I like this idea, seems a good to me to avoid accumulation of numerical error in the GMRES.

antoine-levitt commented 2 years ago

Actually I think we were wrong yesterday Gaspard: Q(H-lambda)Q clearly has an eigenvalue zero, even if Q is not completely converged, and Q(H-lambda)Q dpsi = Q rhs should be completely fine, even if Q is very bad. I think what happens is simply that the cg solution drifts out of ran(Q). @mfherbst remember we had the same problem for the arnoldi with the dielectric matrix? We solved it by a custom orthogonalization, so that would probably be the good solution here, rather than fiddling with tolerances.

antoine-levitt commented 2 years ago

Hm I can't find it again, did we remove it?

gkemlin commented 2 years ago

That rings a bell to me, I think I got inspired with that in 2020, I'm exploring this old perturbation branch from my fork (DFTK 0.1.9 :stuck_out_tongue: )

gkemlin commented 2 years ago

we can probably mix a better orhthogonalization with this tweaked tolerance

antoine-levitt commented 2 years ago

To check the above theory is correct, we should check if norm(dpsi - Q dpsi) drifts across the iterations

gkemlin commented 2 years ago

Yes it does, good catch !

gkemlin commented 2 years ago

So we can change to a custom orthogonalizer, but this is not possible with Iterative solver, we should move to KrylovKit if it's ok for you.

gkemlin commented 2 years ago

Or we can stick with Iterative solvers and find a way round with their iterators https://iterativesolvers.julialinearalgebra.org/stable/iterators/

antoine-levitt commented 2 years ago

As I recall krylovkit doesn't do preconditioning unfortunately. Maybe we can hack around iterative solvers, esp if they give us pointers to the iterates (then even though you're probably not supposed to, we can modify them)

gkemlin commented 2 years ago

Yup, sounds quite easy actually

gkemlin commented 2 years ago

btw this really seems the way to go to keep good tolerances; because I can reproduce the explosion even with these new tolerances

gkemlin commented 2 years ago

Ok, so that was actually quite simple. I just copy-pasted their cg function (from there https://github.com/JuliaLinearAlgebra/IterativeSolvers.jl/blob/master/src/cg.jl#L209) and ensured that the iterates and the residual both stay into Ran(R) (then, this also prevents error accumulation in the descent direction p -- which they call u --). See https://github.com/JuliaMolSim/DFTK.jl/pull/746.

mfherbst commented 2 years ago

I like that idea! That sounds very reasonable.

mfherbst commented 2 years ago

@antoine-levitt Yes we solved that for the dielectric matrix but that was also a different package.

gkemlin commented 2 years ago

Solved with https://github.com/JuliaMolSim/DFTK.jl/pull/750.

JuliaMolSim / DFTK.jl

Test on forces derivatives is failing unexpectedly #742