Open PhilippRue opened 4 years ago
@PhilippRue I have started the implementation of this for the host code and for the voronoi code.
I think that many of the features in kkr_scr
and voro_start
can be added handled in this way as they would basically make use of the process_handler
to handle these states.
However, looking at the way in which you have designed the kkr_scf
it is my impression that that is not the plan you have on mind. E.g. normally I would handle the lack of convergence of the SCF state by using the process_handler
to increase/decrease the mixing, number of steps, etc. But the kkr_scf
takes care of this basically, so I wonder how much of the kkr_scf
should be in the BaseRestartWorkchain
and how much should be left outside?
A similar question happens for the voro_start
where you have a lot of the error handling in the voro_start
, so the question is, how much of that should be handled via the BaseRestartWorkchain
and how much should be done manually?
When these workflows were developed there was no base restart functionality. That's why all error handling that is done so far is in there.
Maybe it still makes sense to separate technical issues (out of memory errors or dimension errors if array dimensions are too small) from convergence issues (cluster radius). We could for example inherit from base restart in the kkr_scf
and extend the process_handler
functionality to handle the mixing parameter etc. However this is probably a major refactoring of the kkr_scf
and voro_start
workflows.
In that way we could simply use the base restart instead of run a KKR calculation in the workflows like dos
where all the convergence issues etc are not present.
To sum up I suggest this structure:
BaseRestartWorkchain
: only technical issues that are helpful for both KKR and voronoi (i.e. not code specific)RestartKKR
: inherits from BaseRestartWorkChain
and add KKR specific things (array dimensions etc). In the same way RestartVoro
could deal with the voronoi code.kkr_scf
inherits from RestartKKR
and extend the pricess_handler
to deal with the convergence issues. In the beginning this could also just call RestartKKR
instead of a KkrCalculation
until the refactoring process is completeLet me know what you think.
When these workflows were developed there was no base restart functionality. That's why all error handling that is done so far is in there.
Sure I understand, it is what made sense to do in that time.
Maybe it still makes sense to separate technical issues (out of memory errors or dimension errors if array dimensions are too small) from convergence issues (cluster radius). We could for example inherit from base restart in the
kkr_scf
and extend theprocess_handler
functionality to handle the mixing parameter etc. However this is probably a major refactoring of thekkr_scf
andvoro_start
workflows.
That could be, however how we detect the out of memory errors? Is that something that is detected by the parser? or that it can be calculated on the fly? Normally one can get these ones on the stdout
or stderr
, which I do not know are parsed right now.
In that way we could simply use the base restart instead of run a KKR calculation in the workflows like
dos
where all the convergence issues etc are not present.
Yes I agree, the base_kkr
should in principle be "dumb" i.e. just handle running a single kkr calculation in the simplest way possible.
To sum up I suggest this structure:
BaseRestartWorkchain
: only technical issues that are helpful for both KKR and voronoi (i.e. not code specific)RestartKKR
: inherits fromBaseRestartWorkChain
and add KKR specific things (array dimensions etc). In the same wayRestartVoro
could deal with the voronoi code.kkr_scf
inherits fromRestartKKR
and extend thepricess_handler
to deal with the convergence issues. In the beginning this could also just callRestartKKR
instead of aKkrCalculation
until the refactoring process is completeLet me know what you think.
I think it is something that might work, however, I wonder about the complexity, if there are three workchains that in principle handle how to do a KKR calculation (90% of the time you want to do a SCF calculation) I wonder if this would lead to confusion in their usage and or expansion.
I think that the approach of using the BaseRestartWorkchain
as done by the QE guys to be the simplest , I think that if you put that to be a RestartKKR(BaseRestartWorkchain)
which inside calls as a process the base_kkr
workchain, things can get very complicated quite fast.
I'm open to help with any solution, but I would like to coordinate with you to ensure that things are well planned out.
That could be, however how we detect the out of memory errors? Is that something that is detected by the parser? or that it can be calculated on the fly? Normally one can get these ones on the
stdout
orstderr
, which I do not know are parsed right now.
Parsing the stderr is an option but at least for slurm there exist some parsing in aiida-core already. However I don't know about the other schedulers.
I think it is something that might work, however, I wonder about the complexity, if there are three workchains that in principle handle how to do a KKR calculation (90% of the time you want to do a SCF calculation) I wonder if this would lead to confusion in their usage and or expansion. I think that the approach of using the
BaseRestartWorkchain
as done by the QE guys to be the simplest , I think that if you put that to be aRestartKKR(BaseRestartWorkchain)
which inside calls as a process thebase_kkr
workchain, things can get very complicated quite fast.I'm open to help with any solution, but I would like to coordinate with you to ensure that things are well planned out.
Yes, maybe that structure is clearer. So the idea would be to use BaseRestartWorkchain
from aiida-core in BaseKKR(BaseRestartWorkchain)
which replaces the kkr_scf workchain, right?
Then whenever a KkrCalculation is used now we just use BaseKKR
instead with some settings to not do the scf functionality?
That could be, however how we detect the out of memory errors? Is that something that is detected by the parser? or that it can be calculated on the fly? Normally one can get these ones on the
stdout
orstderr
, which I do not know are parsed right now.Parsing the stderr is an option but at least for slurm there exist some parsing in aiida-core already. However I don't know about the other schedulers.
Right had forgotten about that one, need to check if it works for other schedulers.
I think it is something that might work, however, I wonder about the complexity, if there are three workchains that in principle handle how to do a KKR calculation (90% of the time you want to do a SCF calculation) I wonder if this would lead to confusion in their usage and or expansion. I think that the approach of using the
BaseRestartWorkchain
as done by the QE guys to be the simplest , I think that if you put that to be aRestartKKR(BaseRestartWorkchain)
which inside calls as a process thebase_kkr
workchain, things can get very complicated quite fast. I'm open to help with any solution, but I would like to coordinate with you to ensure that things are well planned out.Yes, maybe that structure is clearer. So the idea would be to use
BaseRestartWorkchain
from aiida-core inBaseKKR(BaseRestartWorkchain)
which replaces the kkr_scf workchain, right?Then whenever a KkrCalculation is used now we just use
BaseKKR
instead with some settings to not do the scf functionality?
Yes that is what I was thinking, one needs to be clever on how to ensure that if you want to use the single KKR calculation that does not require scf, you do not end up in a situation in which the process handler is activated, so this is something that one must handle in the parser state.
I think the tricky thing is that in other codes, such as QE and VASP, the base workchain is used in all the other workchains (such as dos, band structure, etc) to create a turn key solution, thing which is not the case here. So one needs to be quite careful on how to ensure that those functionalities are not broken, if one wishes to set the main process
to be the BaseKKR
Yes that is what I was thinking, one needs to be clever on how to ensure that if you want to use the single KKR calculation that does not require scf, you do not end up in a situation in which the process handler is activated, so this is something that one must handle in the parser state.
I think the tricky thing is that in other codes, such as QE and VASP, the base workchain is used in all the other workchains (such as dos, band structure, etc) to create a turn key solution, thing which is not the case here. So one needs to be quite careful on how to ensure that those functionalities are not broken, if one wishes to set the main
process
to be theBaseKKR
Yes, but there is a limited number of things that can be seen when parsing the output. In most cases the shape of the energy contour (e.g. NPOL=0
is always special) or the number of iterations (non-scf runs usually have NSTEPS=1
) are special for the non-scf runs. Thus we can come up with a certain set of rules when the scf functionality is by default deactivated + we should provide the possibility to overwrite this in the inputs to the base workchain.
But on the bright side we would get closer to having turn-key workflows ;)
Yes that is what I was thinking, one needs to be clever on how to ensure that if you want to use the single KKR calculation that does not require scf, you do not end up in a situation in which the process handler is activated, so this is something that one must handle in the parser state. I think the tricky thing is that in other codes, such as QE and VASP, the base workchain is used in all the other workchains (such as dos, band structure, etc) to create a turn key solution, thing which is not the case here. So one needs to be quite careful on how to ensure that those functionalities are not broken, if one wishes to set the main
process
to be theBaseKKR
Yes, but there is a limited number of things that can be seen when parsing the output. In most cases the shape of the energy contour (e.g.
NPOL=0
is always special) or the number of iterations (non-scf runs usually haveNSTEPS=1
) are special for the non-scf runs. Thus we can come up with a certain set of rules when the scf functionality is by default deactivated + we should provide the possibility to overwrite this in the inputs to the base workchain.But on the bright side we would get closer to having turn-key workflows ;)
True ;) I'll be working on the refactoring a bit to see what can be done. I'll keep you posted on this.
Use the
base_restart
functionality ofaiida-core v1.1.0
to overcome common technical problems and common problems with the input that can be fixed easily and automatically.A collection of the necessary 'expert knowledge' can be found here: https://iffmd.fz-juelich.de/dRKTeQwmS8uThlwkUJowyw?view