aiidateam / aiida-quantumespresso

The official AiiDA plugin for Quantum ESPRESSO
https://aiida-quantumespresso.readthedocs.io
Other
55 stars 80 forks source link

Confusing `pseudo_family` treatment by `get_builder_from_protocol` and as normal input #712

Open ramirezfranciscof opened 3 years ago

ramirezfranciscof commented 3 years ago

The pseudo_family input is described as being deprecated in its help message, but apparently it is still the right override to provide to get_builder_from_protocol in order to get the right cutoffs set up. Apparently the idea is that only its use as "direct input" was deprecated (so, manually setting builder.pseudo_family for example). This is a bit confusing and should perhaps be revisited, or at least carefully explained when documentation for the get_builder_from_protocol is available.

@mbercx please check that the description is correct and feel free to add anything missing.

mbercx commented 3 years ago

Thanks @ramirezfranciscof! In our discussion on this earlier it was clear to me that this might indeed be confusing, because basically the overrides for the PwBaseWorkChain needs the pseudo_family key at the top level to override the default defined in the .yaml file:

https://github.com/aiidateam/aiida-quantumespresso/blob/c49def7a238617439feb75afd0d0cf9fbb681613/aiida_quantumespresso/workflows/protocols/pw/base.yaml#L8

However, this input is not passed to the top-level input of the PwBaseWorkChain with the corresponding name:

https://github.com/aiidateam/aiida-quantumespresso/blob/c49def7a238617439feb75afd0d0cf9fbb681613/aiida_quantumespresso/workflows/pw/base.py#L66-L69

Which is indeed deprecated. Instead, it is used to load the pseudo family and then obtain the pseudos and cutoffs:

https://github.com/aiidateam/aiida-quantumespresso/blob/c49def7a238617439feb75afd0d0cf9fbb681613/aiida_quantumespresso/workflows/pw/base.py#L180-L194

Which in turn are then actually passed to the inputs of the PwCalculation:

https://github.com/aiidateam/aiida-quantumespresso/blob/c49def7a238617439feb75afd0d0cf9fbb681613/aiida_quantumespresso/workflows/pw/base.py#L199-L200

https://github.com/aiidateam/aiida-quantumespresso/blob/c49def7a238617439feb75afd0d0cf9fbb681613/aiida_quantumespresso/workflows/pw/base.py#L216

I don't think anything should really change here. The pseudo_family input should be removed, and hopefully some of the confusion will be cleared up once it is. pseudo_family is also a reasonable name for the key that holds the label of the pseudopotential family group that should be used. But documentation on how to override the pseudo family is definitely a good idea.

ramirezfranciscof commented 3 years ago

I don't think anything should really change here. The pseudo_family input should be removed, and hopefully some of the confusion will be cleared up once it is. pseudo_family is also a reasonable name for the key that holds the label of the pseudopotential family group that should be used. But documentation on how to override the pseudo family is definitely a good idea.

Mmm ok, then I think I misunderstood how the overrides work. Until now I was expecting the structure of the overrides dict to mimic that of the workchain ports (so, for example, if I wanted to use my own builder.pw.metadata.label = my_label, I know that I need to do overrides = {'pw': {'metadata': {'label': my_label}}}). This is why I was also very surprised to see that the type I needed to use to provide the pseudo_family and the clean_workdir did not coincide with their types as inputs (str and bool vs orm.Str and orm.Bool).

Am I now understanding correctly that the overrides then have a separate tree structure than the input ports of the workchain?

sphuber commented 3 years ago

The problem you describe is indeed something that is not fully figured out yet. When thinking about a work chain and an associated builder generator, there are two types of inputs: those that directly go to the work chain (and therefore match the mapping structure of its input spec) and those that are understood by the builder generator. The latter are usually compound inputs (as one could call them). The pseudo_family is a perfect example. It isn't an input to the work chain (well it currently is, but it never should have been and that is why it is deprecated) but it does make sense for the generator. The latter will unwrap the pseudo family in the actual inputs that go to the work chain, i.e., the actual pseudo nodes in the pw.pseudos namespace and the cutoffs that go in the pw.parameters input.

Now the problem is that we haven't really come up with a well-defined separation or design for them in the get_builder_from_protocol interface. Historically, I used the overrides dictionary for both. This kind of worked because pseudo_family was the only real compound input at the time and was accepted by both the work chain and the protocol. To solve this confusion, we need to work out the design further such that one can override both types of inputs in an intuitive way, without them clashing.

ramirezfranciscof commented 3 years ago

Ah, I see, thanks for the explanation @sphuber .

Indeed I would find it confusing to have both this kinds of modifications in the same function parameter (direct input override and passing some keyword that affected how the inputs are set in more general terms). In that sense I would prefer if they were separate as my expectations on the structure of the overrides part would be clearer.

However I now wonder if the option of "overriding" single inputs is necessary at all. In principle the user could do this manually themselves after getting the builder from get_builder_from_protocol. Any sort of multiple-input setup would probably require specific code implementation and therefore merit its separate (and independently documented) more global keyword (such as this pseudo_family example). Am I missing any possible use case for overriding inputs that would be set up by the builder instead of just overwriting them after the fact?

sphuber commented 3 years ago

Am I missing any possible use case for overriding inputs that would be set up by the builder instead of just overwriting them after the fact?

No this is a good point and one that we had already discussed. We might very well conclude that overrides for individual concrete ports is not necessary as it can be directly on the builder afterwards. This would leave the overrides to override the inputs defined by the protocol.