Closed atztogo closed 5 years ago
Thanks for reporting this Togo. What actually happened in the VASP run? I believe we need to understand that before we decide on what we should return.
Yes. At the SCF step, energy suddenly became huge. See the corresponding OSZICAR
,
N E dE d eps ncg rms rms(c)
DAV: 1 0.680117302455E+04 0.68012E+04 -0.33494E+05 2336 0.108E+03
DAV: 2 0.134835825853E+03 -0.66663E+04 -0.64654E+04 2792 0.272E+02
DAV: 3 -0.525837985484E+03 -0.66067E+03 -0.65344E+03 2528 0.953E+01
DAV: 4 -0.544911727561E+03 -0.19074E+02 -0.18963E+02 2552 0.182E+01
DAV: 5 -0.545546714350E+03 -0.63499E+00 -0.63419E+00 2648 0.310E+00 0.878E+01
DAV: 6 -0.520365275193E+03 0.25181E+02 -0.78087E+01 2816 0.144E+01 0.333E+01
DAV: 7 -0.518626036564E+03 0.17392E+01 -0.15919E+01 2376 0.561E+00 0.162E+01
DAV: 8 -0.518375794718E+03 0.25024E+00 -0.25157E+00 2512 0.244E+00 0.420E+00
DAV: 9 -0.518776812859E+03 -0.40102E+00 -0.67163E-01 2432 0.166E+00 0.571E+00
DAV: 10 -0.518376432285E+03 0.40038E+00 -0.35946E-01 2408 0.109E+00 0.116E+00
DAV: 11 -0.518380449702E+03 -0.40174E-02 -0.71478E-02 2432 0.549E-01 0.958E-01
DAV: 12 -0.518375911886E+03 0.45378E-02 -0.21569E-02 2424 0.461E-01 0.559E-01
DAV: 13 -0.280698662789E+07 -0.28065E+07 -0.26716E+07 2808 0.440E+04 0.611E+03
DAV: 14 0.312490310524E+02 0.28070E+07 -0.21195E+04 2560 0.655E+01 0.222E+02
DAV: 15 -0.448124754973E+03 -0.47937E+03 -0.40760E+03 2584 0.545E+01 0.177E+01
DAV: 16 -0.525014086313E+03 -0.76889E+02 -0.64105E+02 2712 0.240E+01 0.348E+01
DAV: 17 -0.518937167443E+03 0.60769E+01 -0.64778E+01 2608 0.967E+00 0.218E+01
DAV: 18 -0.518856917850E+03 0.80250E-01 -0.56225E+00 2536 0.244E+00 0.198E+01
DAV: 19 -0.518806959300E+03 0.49959E-01 -0.27048E-01 2592 0.915E-01 0.193E+01
...
This might happen due to memory error or algorithmic divergence. **********
appears, I think, when the value goes beyond the number of digits that VASP supposes. Probably I remember that this can happen not only for energy.
Thanks for the additional details. So the increase in energy probably happens after step 19 then? In the case where this happens I would say the DFT loop is voided and we should not formally accept any output that is dependent on the the charge density. The step after is also way out:
<scstep>
<time name="dav"> 49.07 49.06</time>
<time name="total"> 52.36 52.35</time>
<energy>
<i name="e_fr_energy"> -518.37591189 </i>
<i name="e_wo_entrp"> -518.37591189 </i>
<i name="e_0_energy"> -518.37591189 </i>
</energy>
</scstep>
<scstep>
<time name="dav"> 55.01 54.99</time>
<time name="total"> 58.32 58.31</time>
<energy>
<i name="e_fr_energy">**************** </i>
<i name="e_wo_entrp">**************** </i>
<i name="e_0_energy">**************** </i>
</energy>
</scstep>
<scstep>
<time name="dav"> 51.17 51.16</time>
<time name="total"> 54.48 54.48</time>
<energy>
<i name="e_fr_energy"> 31.24903105 </i>
<i name="e_wo_entrp"> 31.25013933 </i>
<i name="e_0_energy"> 31.24958519 </i>
</energy>
</scstep>
We might expect it to come back into some minimum. Question is, should we trust that result? To be on the restrictive side I am tempted to say no. Then we would have to probably tweak the step parameters or something similar to avoid this and restart the calculation (but this is not connected to the parser of course).
If we choose not to trust the result, we would need to determine how to return. I am considering to add some more sensible exit codes that can be used by external calls to get some sensible feedback. One of them could be such a thing like this. Just to return None
seems not to be the right way forward.
I have also seen VASP eject this if the specified field container is flooded, typically by some big number. So it is tempting to check for the ****
in general, but I think we should do this case by case basis. Most likely we would return some sensible exit code for each case, which some external call could act upon.
Thought or comments regarding this? Do you agree that this would be a sensible way forward?
-Validity of this result
There is no physical meaning of intermediate electronic structures before convergence. But as for the interest of numerical calculation, it may have some meaning, so this voiding is expected to be told to users by some means (e.g., putting exactly 0 (seems not a good idea), or None
is possible on numpy and SQL?) The best solution is to let VASP code emit the number not by %f
but by %e
when digits flooding.
-On restart
It is difficult to judge from AiiDA whether this calculation would be restarted or not. In this example, this is one of the set of displaced supercell calculations of a phonon calculation, and other displacement resulted in very smooth convergence (delta E ~ 1e-8). But this convergence ended with delta E ~ 1e-3 at the maximum SCF iteration. Therefore restart is reasonable. From phonon calculation workflow, maybe we can estimate this one is worse than the others, but as a one VASP calculation, it is difficult to do.
-Exit code
If you mean on the aiida-vasp layer, I very agree this idea. As you write, case by case treatment will be necessary, but we can postpone it for a while since we should collect these troublesome cases before defining sensible exist codes (and also considering our resource). Instead raising python Warning like ParsingWarning for parsevasp is a possible choice and some very rough exit code on the aiida-vasp level like ParsingWarning
attached to a specific aiida-vasp version at least to be possible hooked by QueryBuilder. Later, when we define exit codes, on the philosophy of semantic versioning (I don't want to be very strict), but changing exit code is a big thing, we need to change the major version number.
Yes, I agree. Ultimately this should be fixed in VASP. Do you mind submitting the input files to Martijn? Until then, and to support old versions, we need to act in the parser. Due to the problem of returning a value that it is not, e.g. zero or None
I think we should halt and exit from the parser with a suitable exit code that any calling application can pickup. A sensible message would also be needed. I will try to fix this next week when I am back in the office.
With respect to aiida-vasp
, we already have some exit codes that is due to parser issues, but what we are lacking is proper messages ejected from parsevasp
that can be forwarded. This would enable better catch and control mechanisms, also more sensible raise of warnings or errors in aiida-vasp
. So I think implementing this a bit better in parsevasp
makes sense.
Indeed, proper versioning is important when changing the exit codes. We can bump parsevasp
. Maybe more importantly, for aiida-vasp
I suggest we have most of the basic exit codes in place (at least the ranges) settled before we merge the migration_beta
branch into master
.
Fixed in 0319c6b96f9cff1cd54831bc12a8fdd03852e159. Here, we return with a SystemExit
and a code.
I got a sick vasprun.xml with the line
for which parsevasp crashes as follows.
I think this crash is avoided by something like
But how can we set the value in this case?