[Feature] Improve handling of convergence failure

ezpzbz commented 3 years ago

context

Upon exploring failed workchains, the majority are suffering from failure in convergence, either in relaxation or static stage. This issue somehow can be solved by #56 with proper restarting of non-converged calculation. However, it is not the only case. It can happen that we need to change the settings (like ALGO or NELM) to help the convergence. This needs to be addressed urgently.

solution

We can have two ways to implent the handlers:

Having them in base like other handlers. Then, in order to distinguish between relaxation and static run, we can turn the handler on/off upon submitting the calcjob in the main workchain.
Keeping them in the main workchain. Then, we need to have a calcfuntion to track the changes in INCAR. This is not ideal as I would be happy to delegate all handling issue to the base. This way we know that once a calculation is back from base to main, it is ready to go.

ezpzbz commented 3 years ago

To be able to change the handler activity, I need to expose `handler_overrides' too. Then, for instance it can be used as:

if self.ctx.stage_calc_types[self.ctx.stage_tag] == 'static':
    self.ctx.vasp_base.handler_overrides = Dict(dict={'check_static_convergence': True})
    self.ctx.vasp_base.handler_overrides = Dict(dict={'check_relax_convergence': False})

By having the above solution, then we can have process handlers in base like before.

ezpzbz commented 3 years ago

I implemented this and it is in testing process. One bug that I found is about turning off the handle_ionic_convergence when we move to static calculation. These two handlers (ionicandelectronic`) should be activated on case-by-case basis.

ezpzbz commented 3 years ago

I noticed a big issue during the testing. If I keep the convergence handlers in VaspBaseWorkChain, it would cause complications in restarting the calculations. The reason is that the CalcJob might not have the structure output there. This then would require to parse and update structure in the case relaxation runs which needs extra code and duplication. This becomes more tedious when we are dealing with magnetic structures. Therefore, I took back the convergence handlers from base and made them working in multi-stage.

ezpzbz / aiida-catmat

[Feature] Improve handling of convergence failure #57

context

solution