aiidateam / aiida-quantumespresso

The official AiiDA plugin for Quantum ESPRESSO
https://aiida-quantumespresso.readthedocs.io
Other
52 stars 77 forks source link

`PwBaseWorkChain`: Always do full restart for `ERROR_OUT_OF_WALLTIME` #1012

Open mbercx opened 4 months ago

mbercx commented 4 months ago

Fixes #968

The current error handler for the ERROR_OUT_OF_WALLTIME exit code of the PwCalculation will restart from scratch in case the structure has changed during the pw.x run, as is typically the case for relax/vc-relax calculations. For larger structures and more complex calculations - such as those using Hubbard corrections - this can be quite inefficient since obtaining the electronic ground state is often more challenging and hence expensive.

Here we adapt the error handler to always do a full restart from the previous calculation. In case the structure has changed, we still set it as the input structure of the restart calculation.

mbercx commented 4 months ago

@AndresOrtegaGuerrero this should do the job, but there is one thing I'm still considering. For provenance reasons, in case the structure has changed I still pass it as an input to the restart calculation. However, for a full restart (CONTROL.restart_mode = 'restart') Quantum ESPRESSO also restarts from the structure saved in the out/aiida.save folder:

     Atomic positions and unit cell read from directory:
     ./out/aiida.save/
     Atomic positions from file used, from input discarded

This means that we're passing an input structure that will be ignored in any case, and hence this might be slightly confusing for someone that's observing the provenance afterwards. However, running without the input structure is not to QE's liking:

 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
     Error in routine init_pos (1):
     atomic position info missing
 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

So you always have to provide an input structure, even when the restart settings you specify explicitly decides to ignore it. ^^

I was considering perhaps restarting from the charge density instead, which should be pretty efficient. However, a full restart will most likely still be even more efficient, since it will also start from the correct kpt:

     Atomic positions and unit cell read from directory:
     ./out/aiida.save/
     Atomic positions from file used, from input discarded

     Check: negative core charge=   -0.000016

     The initial density is read from file :
     ./out/aiida.save/charge-density

     Starting wfcs from file
     Calculation restarted from scf iteration #     3

     total cpu time spent up to now is        2.7 secs

     Self-consistent Calculation

     iteration #  3     ecut=    45.00 Ry     beta= 0.40
     Calculation restarted from kpoint #    20

However, this calculation failed shortly thereafter:

     WARNING: integrated charge=     6.39693880, expected=    68.00000000

 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
     Error in routine electrons (1):
     charge is wrong
 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

Restarting from the charge density did not have this issue. So I would favor for this approach, which is perhaps slightly less efficient but probably more robust?

bastonero commented 3 months ago

I'll leave my 2-cents considerations:

mbercx commented 3 months ago

Thanks @bastonero! Unfortunately this doesn't make the decision easier. ^^ Can we see from the outputs if QE has been compiled with HDF5?

I suppose we could do a full restart, and in case this fails restart from scratch with the final structure.

bastonero commented 3 months ago

No clue if there's a way. I know the wfcs files have .hdf5 extension, if it can help