A crash report - Githubissues

atztogo commented 5 years ago

I got a sick vasprun.xml with the line

    <i name="e_0_energy">**************** </i>

for which parsevasp crashes as follows.

In [1]: from parsevasp.vasprun import Xml
xm
In [2]: xml = Xml("disp-007/vasprun.xml")
INFO:XmlParser:We are utilizing lxml!
DEBUG:XmlParser:Running parsew.
DEBUG:_check_file:Running check_file.
DEBUG:XmlParser:Running LXML in recovery mode.
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-2-648e90fd890c> in <module>
----> 1 xml = Xml("disp-007/vasprun.xml")

~/code/parsevasp/parsevasp/vasprun.py in __init__(self, file_path, file_handler, k_before_band, extract_all, logger, event)
    130
    131         # parse parse parse
--> 132         self._parse()
    133
    134     def _parse(self):

~/code/parsevasp/parsevasp/vasprun.py in _parse(self)
    159             # run regular method (loads file into memory) and
    160             # enable recovery mode if necessary
--> 161             self._parsew(xml_recover)
    162         else:
    163             # event based, saves a bit of memory

~/code/parsevasp/parsevasp/vasprun.py in _parsew(self, xml_recover)
    214         self._data["eigenvelocities"] = self._fetch_eigenvelocitiesw(vaspxml)
    215         self._data["dos"], self._data["dos_specific"] = self._fetch_dosw(vaspxml)
--> 216         self._data["totens"] = self._fetch_totensw(vaspxml)
    217         self._data["dielectrics"] = self._fetch_dielectricsw(vaspxml)
    218         self._data["projectors"] = self._fetch_projectorsw(vaspxml)

~/code/parsevasp/parsevasp/vasprun.py in _fetch_totensw(self, xml)
   1848             if data is None:
   1849                 return None
-> 1850             data = self._convert_array1D_f(data)
   1851             if index == 0:
   1852                 energies[1] = {"energy_no_entropy":

~/code/parsevasp/parsevasp/vasprun.py in _convert_array1D_f(self, entry)
   2577             data = np.zeros(len(entry), dtype='double')
   2578         for index, element in enumerate(entry):
-> 2579             data[index] = np.fromstring(element.text, sep=' ')
   2580
   2581         return data

ValueError: setting an array element with a sequence.

I think this crash is avoided by something like

try:  
    data[index] = np.array([float(v) for v in element.text.split()])
except ValueError:
   data[index] = SOMEVALUE

But how can we set the value in this case?

espenfl commented 5 years ago

Thanks for reporting this Togo. What actually happened in the VASP run? I believe we need to understand that before we decide on what we should return.

atztogo commented 5 years ago

Yes. At the SCF step, energy suddenly became huge. See the corresponding OSZICAR,

       N       E                     dE             d eps       ncg     rms          rms(c)
DAV:   1     0.680117302455E+04    0.68012E+04   -0.33494E+05  2336   0.108E+03
DAV:   2     0.134835825853E+03   -0.66663E+04   -0.64654E+04  2792   0.272E+02
DAV:   3    -0.525837985484E+03   -0.66067E+03   -0.65344E+03  2528   0.953E+01
DAV:   4    -0.544911727561E+03   -0.19074E+02   -0.18963E+02  2552   0.182E+01
DAV:   5    -0.545546714350E+03   -0.63499E+00   -0.63419E+00  2648   0.310E+00    0.878E+01
DAV:   6    -0.520365275193E+03    0.25181E+02   -0.78087E+01  2816   0.144E+01    0.333E+01
DAV:   7    -0.518626036564E+03    0.17392E+01   -0.15919E+01  2376   0.561E+00    0.162E+01
DAV:   8    -0.518375794718E+03    0.25024E+00   -0.25157E+00  2512   0.244E+00    0.420E+00
DAV:   9    -0.518776812859E+03   -0.40102E+00   -0.67163E-01  2432   0.166E+00    0.571E+00
DAV:  10    -0.518376432285E+03    0.40038E+00   -0.35946E-01  2408   0.109E+00    0.116E+00
DAV:  11    -0.518380449702E+03   -0.40174E-02   -0.71478E-02  2432   0.549E-01    0.958E-01
DAV:  12    -0.518375911886E+03    0.45378E-02   -0.21569E-02  2424   0.461E-01    0.559E-01
DAV:  13    -0.280698662789E+07   -0.28065E+07   -0.26716E+07  2808   0.440E+04    0.611E+03
DAV:  14     0.312490310524E+02    0.28070E+07   -0.21195E+04  2560   0.655E+01    0.222E+02
DAV:  15    -0.448124754973E+03   -0.47937E+03   -0.40760E+03  2584   0.545E+01    0.177E+01
DAV:  16    -0.525014086313E+03   -0.76889E+02   -0.64105E+02  2712   0.240E+01    0.348E+01
DAV:  17    -0.518937167443E+03    0.60769E+01   -0.64778E+01  2608   0.967E+00    0.218E+01
DAV:  18    -0.518856917850E+03    0.80250E-01   -0.56225E+00  2536   0.244E+00    0.198E+01
DAV:  19    -0.518806959300E+03    0.49959E-01   -0.27048E-01  2592   0.915E-01    0.193E+01
...

This might happen due to memory error or algorithmic divergence. ********** appears, I think, when the value goes beyond the number of digits that VASP supposes. Probably I remember that this can happen not only for energy.

espenfl commented 5 years ago

Thanks for the additional details. So the increase in energy probably happens after step 19 then? In the case where this happens I would say the DFT loop is voided and we should not formally accept any output that is dependent on the the charge density. The step after is also way out:

  <scstep>
   <time name="dav">   49.07   49.06</time>
   <time name="total">   52.36   52.35</time>
   <energy>
    <i name="e_fr_energy">   -518.37591189 </i>
    <i name="e_wo_entrp">   -518.37591189 </i>
    <i name="e_0_energy">   -518.37591189 </i>
   </energy>
  </scstep>
  <scstep>
   <time name="dav">   55.01   54.99</time>
   <time name="total">   58.32   58.31</time>
   <energy>
    <i name="e_fr_energy">**************** </i>
    <i name="e_wo_entrp">**************** </i>
    <i name="e_0_energy">**************** </i>
   </energy>
  </scstep>
  <scstep>
   <time name="dav">   51.17   51.16</time>
   <time name="total">   54.48   54.48</time>
   <energy>
    <i name="e_fr_energy">     31.24903105 </i>
    <i name="e_wo_entrp">     31.25013933 </i>
    <i name="e_0_energy">     31.24958519 </i>
   </energy>
  </scstep>

We might expect it to come back into some minimum. Question is, should we trust that result? To be on the restrictive side I am tempted to say no. Then we would have to probably tweak the step parameters or something similar to avoid this and restart the calculation (but this is not connected to the parser of course).

If we choose not to trust the result, we would need to determine how to return. I am considering to add some more sensible exit codes that can be used by external calls to get some sensible feedback. One of them could be such a thing like this. Just to return None seems not to be the right way forward.

I have also seen VASP eject this if the specified field container is flooded, typically by some big number. So it is tempting to check for the **** in general, but I think we should do this case by case basis. Most likely we would return some sensible exit code for each case, which some external call could act upon.

Thought or comments regarding this? Do you agree that this would be a sensible way forward?

atztogo commented 5 years ago

-Validity of this result There is no physical meaning of intermediate electronic structures before convergence. But as for the interest of numerical calculation, it may have some meaning, so this voiding is expected to be told to users by some means (e.g., putting exactly 0 (seems not a good idea), or None is possible on numpy and SQL?) The best solution is to let VASP code emit the number not by %f but by %e when digits flooding.

-On restart

It is difficult to judge from AiiDA whether this calculation would be restarted or not. In this example, this is one of the set of displaced supercell calculations of a phonon calculation, and other displacement resulted in very smooth convergence (delta E ~ 1e-8). But this convergence ended with delta E ~ 1e-3 at the maximum SCF iteration. Therefore restart is reasonable. From phonon calculation workflow, maybe we can estimate this one is worse than the others, but as a one VASP calculation, it is difficult to do.

-Exit code

If you mean on the aiida-vasp layer, I very agree this idea. As you write, case by case treatment will be necessary, but we can postpone it for a while since we should collect these troublesome cases before defining sensible exist codes (and also considering our resource). Instead raising python Warning like ParsingWarning for parsevasp is a possible choice and some very rough exit code on the aiida-vasp level like ParsingWarning attached to a specific aiida-vasp version at least to be possible hooked by QueryBuilder. Later, when we define exit codes, on the philosophy of semantic versioning (I don't want to be very strict), but changing exit code is a big thing, we need to change the major version number.

espenfl commented 5 years ago

Yes, I agree. Ultimately this should be fixed in VASP. Do you mind submitting the input files to Martijn? Until then, and to support old versions, we need to act in the parser. Due to the problem of returning a value that it is not, e.g. zero or None I think we should halt and exit from the parser with a suitable exit code that any calling application can pickup. A sensible message would also be needed. I will try to fix this next week when I am back in the office.

With respect to aiida-vasp, we already have some exit codes that is due to parser issues, but what we are lacking is proper messages ejected from parsevasp that can be forwarded. This would enable better catch and control mechanisms, also more sensible raise of warnings or errors in aiida-vasp. So I think implementing this a bit better in parsevasp makes sense.

Indeed, proper versioning is important when changing the exit codes. We can bump parsevasp. Maybe more importantly, for aiida-vasp I suggest we have most of the basic exit codes in place (at least the ranges) settled before we merge the migration_beta branch into master.

espenfl commented 5 years ago

Fixed in 0319c6b96f9cff1cd54831bc12a8fdd03852e159. Here, we return with a SystemExit and a code.

aiida-vasp / parsevasp

A crash report #17