Closed gkemlin closed 2 years ago
I hate symmetriiiiiiiiiiiiiiiiies
This really only happens for the rHF model
You're sure this is not because of this being a metal?
Edit: This is really because of the way we setup up this particular structure and problem, see below.
I found that a different number of irreducible k-points is used between some of the calculations (8 in the displaced finite-diff, 6 in the displaced and 6 in the forward-diff run). That in combination with the small Ecut and kgrid used, might explain it ... especially given that we model a metal here.
Edit: Ok that's not the full story. Even with a larger kgrid and Ecut it gives differences. Increasing temperature or changing the blowup function also does not help.
Yes, indeed. Even symmetries=false
in the model gives different results.
You're sure this is not because of this being a metal?
I've tried with magnesium and I don't manage to reproduce this.
Indeed, the symmetrization of the forces appears to be wrong here, the forces at (0,0) are different with and without symmetries. MWE:
using DFTK
using ForwardDiff
using Random
a = 10.26 # Silicon lattice constant in Bohr
lattice = a / 2 * [[0 1 1.];
[1 0 1.];
[1 1 0.]]
Si = ElementPsp(:Si, psp=load_psp("hgh/lda/Si-q4"))
atoms = [Si, Si]
positions = [[1.02, 0.97, 1.04] / 8, -ones(3) / 8]
temperature = 1e-3
for sym in (true, false)
ε1 = ε2 = 0.0
# solve rHF for Silicon (metallic)
pos = positions .+ [ zeros(3), ε1 * [1., 0, 0] + ε2 * [0, 1., 0]]
model = model_DFT(lattice, atoms, pos, []; temperature, symmetries=sym)
basis = PlaneWaveBasis(model; Ecut=5, kgrid=[2, 2, 2], kshift=[0, 0, 0])
is_converged = DFTK.ScfConvergenceDensity(1e-10)
scfres = self_consistent_field(basis; is_converged,
response=ResponseOptions(verbose=true))
scfres.energies
# return forces
display(compute_forces_cart(scfres))
end
I'll try to take alook...
This appears to be a case of accidental symmetries: changing the numbers in the positions somewhat results in less symmetries. Did I mention I hated symmetries?
With forces_nonsym the forces obtained from a symmetries=false computation:
julia> DFTK.symmetrize_forces(model, forces_nonsym; symmetries=model.symmetries) - forces_nonsym
2-element Vector{StaticArraysCore.SVector{3, Float64}}:
[-8.005238182409724e-13, 0.004153669530948306, 5.807881302624662e-13]
[-8.005238182409724e-13, -0.004153669533415452, 5.807881302624662e-13]
(and the same for the basis.symmetries, so that's not the issue). So, assuming our nonsymmetric forces computation is correct (which I think it is), either our symmetry detection or our forces symmetrization routine is wrong...
I'm a bit stumped there: the symmetry detection should be robust (we get them from spglib, and we recheck them independently), and we debugged the forces symmetrization routine, which agrees with papers. The symmetries do form a group. I remember something about @jaemolihm saying we should discard "accidental" or weird symmetries (meaning symmetries with a w which is not rational or something), but I don't remember what the point was, @jaemolihm do you remember?
I ran the MWE, I think the issue is different, because sym and nosym results differ even when explicitly setting symmetries to those with simple w
.
I find that only the DFTK.TermAtomicNonlocal
term contribution is different, so an issue might be there
julia> basis.terms .|> typeof
8-element Vector{DataType}:
DFTK.TermKinetic
DFTK.TermAtomicLocal{Array{Float64, 3}}
DFTK.TermAtomicNonlocal
DFTK.TermEwald{Float64}
DFTK.TermPspCorrection{Float64}
DFTK.TermHartree
DFTK.TermNoop
DFTK.TermEntropy
# sym true
forces_per_term = Union{Nothing, Vector{StaticArrays.SVector{3, Float64}}}
nothing
StaticArrays.SVector{3, Float64}[[0.07038869844203988, -2.886579864025407e-15, 0.10032645829629105], [-0.07038869850082263, -2.220446049250313e-15, -0.10032645830509423]]
StaticArrays.SVector{3, Float64}[[-0.1504997004907923, 0.1603602195724699, -0.17022073865414747], [0.15049970053068973, -0.1603602196038185, 0.17022073867694731]]
StaticArrays.SVector{3, Float64}[[-0.04033939191779377, 9.57637842253505e-16, -0.0578011857046174], [0.04033939191779377, -9.576513873907308e-16, 0.0578011857046174]],
nothing, nothing, nothing, nothing]
# sym false
forces_per_term = Union{Nothing, Vector{StaticArrays.SVector{3, Float64}}}[
nothing,
StaticArrays.SVector{3, Float64}[[0.07038869827965999, -4.0044856319809696e-11, 0.1003264579835943], [-0.07038869827688266, -6.477041125663163e-13, -0.10032645800503182]],
StaticArrays.SVector{3, Float64}[[-0.04773509506391771, 2.6638774519582853e-11, -0.06745613320749312], [0.04773509506303912, -4.547834331347644e-12, 0.06745613321915436]],
StaticArrays.SVector{3, Float64}[[-0.04033939191779377, 9.57637842253505e-16, -0.0578011857046174], [0.04033939191779377, -9.576513873907308e-16, 0.0578011857046174]],
nothing, nothing, nothing, nothing]
I find that only the
DFTK.TermAtomicNonlocal
term contribution is different, so an issue might be there
This is because only that term uses symmetrize_forces
. The unsymmetrized forces satisfy f = W' * f
, not f = W * f
, while the symmetrization assues the latter. Maybe another transpose issue..
https://github.com/JuliaMolSim/DFTK.jl/blob/master/src/symmetry.jl#L246
Nice detective work! We do make sure that the W is the right real space rotation (spglib.jl:118), so that part should be OK. The formula we have right now seems consistent with (A.27) of https://arxiv.org/pdf/0906.2569.pdf, but probably there's a subtlety, I'll try to rederive it.
Force means $E(\vec{\delta x}) = \vec{\delta x} \cdot \vec{F}$. Under rotation, $E(W \vec{\delta x}) = (W \vec{\delta x}) \cdot \vec{F}$. Symmetry means $E(\vec{\delta x}) = E(W \vec{\delta x})$. Thus, we have $\vec{\delta x} \cdot \vec{F} = (W \vec{\delta x}) \cdot \vec{F}$. Thus $F_i = Fj W{ji}$ (in Cartesian coordinates).
I agree with this, and it makes sense since forces are covectors. (A.27) says differently, but maybe it's valid in cartesian but not in reduced or something
Maybe R
in the QE paper is S = W'
, not W
?
This also works
f = forces[group[i_other_at]]
sym_f = covector_cart_to_red(model, matrix_red_to_cart(model, W) * covector_red_to_cart(model, f))
symmetrized_forces[idx] += sym_f
This does sym_f_red = A^-T A Wred A^-1 A^T Fred, where A is the lattice. It doesn't seem to me that it's equivalent to doing Wred^T Fred, even using the fact that Wcart is orthogonal.
Maybe R in the QE paper is S = W', not W?
I don't think so, since they write that for every position r, R(r+f) should be an atom of the same type as r, and we check that Wr+w is of the same type as r, so the conventions should be equivalent up to w=Rf (which we do take into account correctly, I hope...)
This does sym_f_red = A^-T A Wred A^-1 A^T Fred
Another way of writing this is sym_f_red = inv(Wred') f_red. This works also in this example. Your reasoning above says F = W' F, but W' = inv(W), so that F = inv(W) F, and by multiplying by W, F = W F, so it's also consistent with that.
I did some more tests with the fix proposed in #744 (which is definitely the right direction to follow) and I think the issue with the failing tests has to do with the way we treat symmetries in the response. Because now the value part in the forces seems very reasonable also in the forwarddiff calculation, but the dual part does not look right, e.g. using Gaspard's script
derivative_ε1_fd = [[-0.010188772141717269, 0.5540641933043239, 0.5503864129594938], [0.010187937546130184, -0.5540640837126478, -0.5503858548851172]]
derivative_ε1 = [[1.8376265562014128e6, 5.507626029595309e6, -1.837625369839999e6], [-1.8376253613343276e6, -5.50762642314619e6, 1.8376267549620424e6]]
but when I run it again:
derivative_ε1_fd = [[-0.01018562048691208, 0.5540706243039802, 0.5503838137466783], [0.010186103892827203, -0.5540667950994921, -0.5503840477318837]]
derivative_ε1 = [[0.576337577783623, 0.6292566906509841, 0.6100238361659679], [0.624274131079789, -1.0219749965960532, 0.7693535838844899]]
and without symmetries:
derivative_ε1_fd = [[-0.0101852412196673, 0.5540697222225325, 0.5503834867633052], [0.010185991221832707, -0.5540706391081104, -0.5503840552398427]]
derivative_ε1 = [[-0.010202755281099029, 0.5540729798449112, 0.5503845518212458], [0.010202757408728927, -0.5540730086009197, -0.5503845604364989]]
I lack the time for today to dig further, but at least a step.
Looks terribly also like some uninitialised value or something like that.
... or the perturbation breaks one of the symmetries and we filter that out during the response calculation?
Haven't looked closely but plausibly the dV is zero and we normalize it in chi0 for numerical purposes, which breaks things somehow?
Ok, so I'm trying to summarize what happens :smile: If I'm not mistaking, we are currently facing two issues :
Starting from there, I think have spotted the point 2) bug from https://github.com/JuliaMolSim/DFTK.jl/pull/744. Sorry in advance for the long message! Actually I'm not sure it comes from symmetry but rather from some badly initialized value, as already mentioned. When we solve the Dyson equation here https://github.com/JuliaMolSim/DFTK.jl/blob/353c11f1871adffae8bcbad9aadb95faa929f718/src/response/hessian.jl#L161
δρ0
already has huge norm. This comes from the first application of apply_χ0_4P
, where one of the solves of the Sternheimer fails. The Sternheimer tolerance is set to tol/10
where tol
in ForwardDiff is scfres.norm_Δρ
. That makes tol_sternheimer
to 4.12e-12
and the behavior of the Sternheimer is crazy for one band of the gamma point, where the cg gets stuck for 40 iteration at 1e-11
before going up and down again, resulting in a δψk[:,n]
with huge norm (see verbose in this gist). I'm not sure I understand why this particular band get stuck with the CG at 1e-11
, but here are some tracks to follow.
tol_sternheimer
to tol
instead of tol/10
here https://github.com/JuliaMolSim/DFTK.jl/blob/353c11f1871adffae8bcbad9aadb95faa929f718/src/response/hessian.jl#L129 then the test passes. Diagonal(εk_extra .- εk[N])
to have small values: for the band for which the Sternheimer fails, the first value on the diagonal is εk[N+1] - εk[N] = 0.00107679
, and both eigenvalues are converged enough, with occk[N]>1e-6
and occk[N+1]~1e-7
. I would say that the problem is not the Schur (because it works perfectly without the extra bands, and we know it has a worse conditioning number then), but rather the way the inverse of Diagonal(εk_extra .- εk[N])
is applied, which accumulates bigger and bigger numbers during the stagnation of the cg. @antoine-levitt any ideas on how to improve this part to avoid this ?1e-8
rather than 1e-6
, everything works fine.This does not solve the symmetry issues, but at least we now know where this crazy gmres comes from :upside_down_face:
Regarding the large norm in apply_\chi0
we already do something against this, namely here:
https://github.com/JuliaMolSim/DFTK.jl/blob/16c1ef1e214e01ce65a7f67429a0ab31352a84f9/src/response/chi0.jl#L351-L353
It should not be a problem to do something similar in solve_ΩplusK_split as well (or in the forward-diff rule calling it) and I'd not be surprised if this fixes all the issues.
I think most of what you describe is basically treating the symptom: If your input δρ is huge the solver has a tough time to converge to 1e-12 that makes a lot of sense. At some point it might get stuck in some numerical issues because of the range the data types need to represent and then depending on how you tweak a few things you might get around it or not. Symmetry can be one of these tweak parameters.
I'm not sure I see what you mean : If the δρ0
out of the first apply_χ0_4P
is completely screwd up, how can we deal with it ? If we divide δρ0
by it norm and then re-multiply with it, this won't solve the issue ?
I think most of what you describe is basically treating the symptom: If your input δρ is huge the solver has a tough time to converge to 1e-12 that makes a lot of sense. At some point it might get stuck in some numerical issues because of the range the data types need to represent and then depending on how you tweak a few things you might get around it or not. Symmetry can be one of these tweak parameters.
Yes, I agree with that.
I'm not sure I see what you mean : If the δρ0 out of the first apply_χ0_4P is completely screwd up, how can we deal with it ?
You deal with it even before ... by scaling the RHS, i.e. the δHψ
. That should work, no?
I've normalized the rhs of every sternheimer (ie δHψkn
is normalized) and now the CG stagnates at 6e-12
instead of 1e-11
! The tolerance we're asking from ForwardDiff is 5e-12
... I suggest two possibilities:
1e-9
for the convergence of the SCF in the test instead of 1e-10
, so that ForwardDiff asks for higher tolerance. This seems ok for me because asking for tolerance 1e-10
on the density means that the energy is already at machine precision.1e-6
by default, so we can set the tolerance in the cg to abstol/sqrt(occ[ik][n])
instead. This won't cause any issue because we multiply by occ[ik][n]
afterwards, but this allows for a bit more flexibility on the sternheimer for the last occupied bands.Both solutions work and seem good to me, I'll be happy to have your opinion on that !
The trace you linked with the residual going down and then up and down is very characteristic of "oversolving": when you solve iteratively Ax=b where A has a nullspace (which is not exactly zero), and you are asking for really tight tolerances, at some point the iterative solver is going to do figure out that there is something nonzero in the nullspace, and it's going to want to invert that (with catastrophic results, of course). We should try to figure out why that happens, let's discuss it sometime. Regarding solutions, projecting out the nullspace might not work, because the nullspace does not necessarily numerically commute with A. I'm not aware of a general solution to the problem, but the QE technique of shifting it away from zero might work...
Maybe the idea of adapting the Sternheimer tolerance depending on the occupation might at least be a way out. Indeed I would expect more issues close to the Fermi level (e.g. rotations in the p or d orbitals, which could be such a nullspace). I agree we should investigate this more thoroughly, so I would stay very conservative for now. abstol/sqrt(occ[ik][n])
might be a bit too loose though. We should in any case investigate this on some non-trivial examples before introducing it.
That's probably not the nullspace that matters, I meant simply the one from the projection onto the occupied bands. If these are converged to some eps and you solve the sternheimer to less than eps, you're in trouble. It'd be nice to find a proper way to solve this once and for all and not rely on heuristics
I agree, but perhaps for now we should just ensure in
https://github.com/JuliaMolSim/DFTK.jl/blob/master/src/response/hessian.jl#L129
to not overdo it as @gkemlin suggested. I actually did some experiments in #737 and found that tol_sternheimer=tol
is ok in the initial and final apply_χ0_4P
and only in the inner apply_χ0
I took the minimal tolerance to be tol / 10
. I assume that with the rescaling @gkemlin tried should stabilise the solver sufficiently for now.
Other thoughs?
Well, I'm not sure this would prevent the over-solving issue ? This can still happen in the inner apply_χ0
because we just ask the sternheimer solver to converge below a tolerance it can not always reach. We've discussed that with @antoine-levitt yesterday, and, actually, asking the sternheimer to converge below the residual of the eigenvectors defining the occupied states is not reasonable and currently, scfres.norm_Δρ/10
can be too much below this limiting tolerance. I've thus changed the tol_sternheimer
from ForwardDiff rules to the maximum of the eigenvector residuals from the scfres in this PR https://github.com/JuliaMolSim/DFTK.jl/pull/746, but maybe using directly scfres.norm_Δρ
as the sternheimer tolerance could do the work too.
I only used scfres.norm_Δρ
as a measure for the accuracy of the SCF ground state. You can certainly have that this value is lower than the residual of the eigenbasis, so from that point of view it seems reasonable to avoid that measure to determine the tol
and base it on the maximal residual of the occupied instead.
Regarding the interplay of tol
and tol_sternheimer
, how about the following:
tol = max_residual
and tol_sternheimer=tol
.tol
(cruder solve) then we have tol_sternheimer=tol/10
, because in my test just selecting tol_sternheimer=tol
was fine for the the initial and final apply_χ0_4P
, but not for the nested apply_χ0
inside the GMRES due to accumulation of numerical error.tol_sternheimer=max(maximal_residual, tol/10)
But regarding the last point, I'm happy to be convinced otherwise. I.e. if you can do some serious response calculations that show that setting tol_sternheimer=tol/10
is not needed, than I'm ok with having tol_sternheimer=tol
by default (but for now I would keep both kwargs).
tol_sternheimer=max(maximal_residual, tol/10)
I like this idea, seems a good to me to avoid accumulation of numerical error in the GMRES.
Actually I think we were wrong yesterday Gaspard: Q(H-lambda)Q clearly has an eigenvalue zero, even if Q is not completely converged, and Q(H-lambda)Q dpsi = Q rhs should be completely fine, even if Q is very bad. I think what happens is simply that the cg solution drifts out of ran(Q). @mfherbst remember we had the same problem for the arnoldi with the dielectric matrix? We solved it by a custom orthogonalization, so that would probably be the good solution here, rather than fiddling with tolerances.
Hm I can't find it again, did we remove it?
That rings a bell to me, I think I got inspired with that in 2020, I'm exploring this old perturbation branch from my fork (DFTK 0.1.9 :stuck_out_tongue: )
we can probably mix a better orhthogonalization with this tweaked tolerance
To check the above theory is correct, we should check if norm(dpsi - Q dpsi) drifts across the iterations
Yes it does, good catch !
So we can change to a custom orthogonalizer, but this is not possible with Iterative solver, we should move to KrylovKit if it's ok for you.
Or we can stick with Iterative solvers and find a way round with their iterators https://iterativesolvers.julialinearalgebra.org/stable/iterators/
As I recall krylovkit doesn't do preconditioning unfortunately. Maybe we can hack around iterative solvers, esp if they give us pointers to the iterates (then even though you're probably not supposed to, we can modify them)
Yup, sounds quite easy actually
btw this really seems the way to go to keep good tolerances; because I can reproduce the explosion even with these new tolerances
Ok, so that was actually quite simple. I just copy-pasted their cg function (from there https://github.com/JuliaLinearAlgebra/IterativeSolvers.jl/blob/master/src/cg.jl#L209) and ensured that the iterates and the residual both stay into Ran(R) (then, this also prevents error accumulation in the descent direction p -- which they call u --). See https://github.com/JuliaMolSim/DFTK.jl/pull/746.
I like that idea! That sounds very reasonable.
@antoine-levitt Yes we solved that for the dielectric matrix but that was also a different package.
Solved with https://github.com/JuliaMolSim/DFTK.jl/pull/750.
In the recent CI from the previous PR's (see eg here), there was some bug happening with the derivative of forces for metal which I wanted to investigate, but this was merged automatically. In the test
test/forwardiff.jl
, we use rHF silicon with temperature for this test and it happens to fail sometimes, always for this specific case and not any other test. At first I thought this was an error from χ0, but actually the error seems to happen before. For instance, I've tweaked a bit the configuration fromtest/forwardiff.jl
in the code below, and this results in a error fromsymmetrize_ρ
https://github.com/JuliaMolSim/DFTK.jl/blob/17c64890a38b17fc8b57abe5306715aee56e9127/src/response/chi0.jl#L345 where the assertion from https://github.com/JuliaMolSim/DFTK.jl/blob/17c64890a38b17fc8b57abe5306715aee56e9127/src/fft.jl#L69 is not passed.Any ideas what is happening here ? This really only happens for the rHF model, so maybe there is something we miss here but I can't manage to understand what...