Now, we are completely dependent on pymatgen parsers. This is quite okay with parsing vasprun.xml but causes serious issues with OUTCAR.
There is an open issue in AiiDA that discusses the problem in detail.
https://github.com/aiidateam/aiida-core/issues/3973
Long story short, parsing OUTCAR can take termendous amount of time to be completed. In few cases that I followed, it takes up to 10-12 minutes to finish parsing and meanwhile it does not release the GIL. It consequently causes the RabbitMQ to miss two consecutive heartbeats (60+60 seconds) and therefore, it assumes process is lost/dead and gives the task to another worker to complete. Same thing will happen with the second worker. Meanwhile, first worker finishes parsing and its job and even registers the resulting nodes in the database. Right after it, second worker also does the same and AiiDA stops it as there is already sealed node in the database, and finally workchain fails.
There can be several solutions to this issue:
It can be solved in AiiDA by locking the process in this particular cases.
It can be handled by increasing heartbeat threshold to some other values.
In our case, it can be solved by changing the OUTCAR parser.
Steps to be done:
[x] change the parser and code to reproduce results in same format as before to avoid missing finished calculations
[x] disable parsing potcar and eigenvalues in vasprun.xml
Now, we are completely dependent on
pymatgen
parsers. This is quite okay with parsingvasprun.xml
but causes serious issues withOUTCAR
. There is an open issue inAiiDA
that discusses the problem in detail. https://github.com/aiidateam/aiida-core/issues/3973Long story short, parsing
OUTCAR
can take termendous amount of time to be completed. In few cases that I followed, it takes up to 10-12 minutes to finish parsing and meanwhile it does not release the GIL. It consequently causes theRabbitMQ
to miss two consecutive heartbeats (60+60 seconds) and therefore, it assumes process is lost/dead and gives the task to another worker to complete. Same thing will happen with the second worker. Meanwhile, first worker finishes parsing and its job and even registers the resulting nodes in the database. Right after it, second worker also does the same andAiiDA
stops it as there is already sealed node in the database, and finally workchain fails.There can be several solutions to this issue:
AiiDA
by locking the process in this particular cases.heartbeat
threshold to some other values.OUTCAR
parser.Steps to be done:
potcar
andeigenvalues
invasprun.xml