`output_trajectory` is not complete when launching another ionic minimization iteration

superstar54 commented 1 year ago

In the PwRelaxWorkChain, if the PwCalculation failed with The ionic minimization cycle did not converge after the maximum number of steps. The work chain will launch another iteration. After completing the work chain after two iterations, it only exports the output_trajectory of the last PwCalculation as the final ouput_trajectory. However, the correct trajectory should combine the two output_trajectory and export as the final output_trajectory.

Here is an example:

PwRelaxWorkChain
- PwBaseWorkChain
- PwCalculation, iteration 1, ionic minimization cycle did not converge after the maximum number of steps. output_trajectory 1
- PwCalculation, iteration 2, completed. output_trajectory 2

Now the output_trajectory = output_trajectory 2

The correct one should be: output_trajectory = output_trajectory 1 + output_trajectory 2

mbercx commented 1 year ago

Yeah, fair point. The issue here lies in the PwBaseWorkChain simply taking the outputs of the final calculation. We can definitely do something similar to what @mkotiuga and I set up for the PhBaseWorkChain in https://github.com/aiidateam/aiida-quantumespresso/pull/818 and merge the outputs as you suggest.

I know @sphuber has been working quite a bit at moving all the parsing to the XML file for the PwCalculation though. (see the corresponding branch) Maybe we should finalise that work first and then see which outputs require merging for a more holistic approach.

sphuber commented 1 year ago

I know @sphuber has been working quite a bit at moving all the parsing to the XML file for the PwCalculation though. (see the corresponding branch) Maybe we should finalise that work first and then see which outputs require merging for a more holistic approach.

The fix for this feature request would not touch the PwCalculation though. Since it is the PwBaseWorkChain that would have to merge the trajectories, all the change in code would go there.

I wouldn't call this is a "bug" though. It is simply a design decision. I can understand the intuition that one might expect the complete trajectory in this case, but that is only in this specific case. Probably if the first calculation "hard" failed and it restarted from scratch, you wouldn't expect that trajectory in the final one. So now we have to start deciding which cases we merge and in which we don't. Maybe there is not a clear answer here, and it could actually create cases that would be counter-intuitive for users in another way.

Most importantly though, if we were to change this, it is backwards incompatible and so cannot just add this change in a normal release.

mbercx commented 1 year ago

I wouldn't call this is a "bug" though. It is simply a design decision. I can understand the intuition that one might expect the complete trajectory in this case, but that is only in this specific case. Probably if the first calculation "hard" failed and it restarted from scratch, you wouldn't expect that trajectory in the final one. So now we have to start deciding which cases we merge and in which we don't. Maybe there is not a clear answer here, and it could actually create cases that would be counter-intuitive for users in another way.

Good point, I hadn't thought about this in much detail. With the PhBaseWorkChain, my policy was that the outputs should be as if the PhCalculation had run successfully in one go. I'll have to go over the various scenarios/outputs to see if we can apply this policy here as well.

Most importantly though, if we were to change this, it is backwards incompatible and so cannot just add this change in a normal release.

I guess that depends on whether we define it as a "bug" or not. 😏 But we'll have to start thinking about a major release soonish anyways, because the changes I would like to make for the PwRelaxWorkChain will be unequivocally backwards-incompatible.

superstar54 commented 1 year ago

So now we have to start deciding which cases we merge and in which we don't. Maybe there is not a clear answer here,

Since this is related to ionic minimization only, and we merge when there is a successful (partially) trajectory, I think the decision is not difficult.

sphuber commented 1 year ago

What about the output_trajectory of the PwRelaxWorkChain though? It can run multiple PwBaseWorkChains. Should those also be merged? I am still not convinced that it always makes sense to merge, or at the very least, it is not always going to be clear that the trajectory is going to be merged, and which sub processes it includes. Also, won't it be a bit inconsistent that all outputs will be just that of the last workchain, except for the output_trajectory?

What if instead we add an option to request merging and then add the merged trajectory as a separate output, e.g., complete_trajectory or something.

superstar54 commented 1 year ago

It can run multiple PwBaseWorkChains. Should those also be merged?

Yes.

Also, won't it be a bit inconsistent that all outputs will be just that of the last workchain, except for the output_trajectory?

I didn't see an inconsistency if output_trajectory is a combination of all trajectories from the subprocess. Let's think about the goal of a relax calculation. Users want to get the following:

final structure and other properties related to it. This is from the last work chain.
the trajectory of the relaxation process. This should begin with the input structure, and end with the final structure. It is the output_trajectory of the PwRelaxWorkChain, instead of the PwBaseWorkChain.

mbercx commented 1 year ago

The fix for this feature request would not touch the PwCalculation though. Since it is the PwBaseWorkChain that would have to merge the trajectories, all the change in code would go there.

True, but perhaps some of the outputs that require merging will change? Perhaps not, but I'd avoid having to deal with adapting the merging as well in the XML parsing PR.

It can run multiple PwBaseWorkChains. Should those also be merged?

Merging the output_trajectory of all the PwBaseWorkChains is an appealing approach, but since it's a backwards-incompatible change I'd definitely take some time to think on potential pitfalls. Off the top of my head:

We'd be duplicating (triplicating in some cases, even) quite a bit of information. How much extra storage would you need for a typical PwRelaxWorkChain run?
Since the basis sets are regenerated at the start of each run, the pressures calculated across different runs aren't fully consistent. This may already be the case if the final SCF is also added to the output_trajectory, I'd have to check. In fact, this means it's pretty much impossible to adhere to my desired policy that the outputs should be as if the PhCalculation had run successfully in one go, since a restarted run will always be different.

Not saying this means we shouldn't go forward with this, but let's make sure we aren't missing anything.

sphuber commented 1 year ago

True, but perhaps some of the outputs that require merging will change? Perhaps not, but I'd avoid having to deal with adapting the merging as well in the XML parsing PR.

I don't see how that would even be possible. The PwParser won't have access to the outputs of the previous calculation, so how can it merge. Of course it can start using the provenance graph to go up, but that is clearly not the way to go. So I don't think the PwCalculation or PwParser would be touched in anyway. So fear not! 😄

We'd be duplicating (triplicating in some cases, even) quite a bit of information. How much extra storage would you need for a typical PwRelaxWorkChain run?

Very good point, and trajectory data tends to get large!

mbercx commented 1 year ago

I don't see how that would even be possible. The PwParser won't have access to the outputs of the previous calculation, so how can it merge.

Haha, I'm clearly not being clear enough. I meant that if we first introduce merging of the outputs at the PwBaseWorkChain level, we might have to adapt/rethink how the outputs are merged afterwards if we change the outputs in the XML parsing PR. But I'm talking hypotheticals here, so let me first have a closer look.

mbercx commented 1 year ago

One idea would be to add a tool to the PwRelaxWorkchain in the tools namespace that merges the output trajectory. @sphuber is there any reason why we wouldn't extend this concept to work chains as well?

sphuber commented 1 year ago

One idea would be to add a tool to the PwRelaxWorkchain in the tools namespace that merges the output trajectory. @sphuber is there any reason why we wouldn't extend this concept to work chains as well?

No, I don't see a reason why not. The code in aiida-core is straightforward, so adding it should be simple.

mbercx commented 1 year ago

No, I don't see a reason why not. The code in aiida-core is straightforward, so adding it should be simple.

Great! Ideally a BaseRestartWorkChain would also have a tools namespace that allows direct access to similar tools as those on the wrapped process, but allow for some modifications/extra methods if needed.

An example here is the get_magnetic_configuration tool I'm adding in https://github.com/aiidateam/aiida-quantumespresso/pull/640. I'm running the pw.x calculation in a PwBaseWorkChain there since I want to use the protocols (also see https://github.com/aiidateam/aiida-quantumespresso/issues/947), and then have to go look for the final PwCalculation to get the magnetic configuration instead of just having a tools namespace on the PwBaseWorkChain itself, see:

https://aiida-quantumespresso--640.org.readthedocs.build/en/640/tutorials/magnetism.html#passing-the-magnetic-configuration

In this case the method would be exactly the same, but in other cases (like merging the output_trajectory -> e.g. get_full_trajectory) the tool could depend on the work chain (PwBaseWorkChain vs PwRelaxWorkChain).

aiidateam / aiida-quantumespresso

`output_trajectory` is not complete when launching another ionic minimization iteration #937