aiidateam / aiida-common-workflows

A repository for the implementation of common workflow interfaces across materials-science codes and plugins
https://aiida-common-workflows.readthedocs.io
MIT License
52 stars 31 forks source link

Possibility to pass plugin-specific inputs to get_builder() #4

Closed bosonie closed 4 years ago

bosonie commented 4 years ago

This is mostly for @sphuber, but it is an interesting point to discuss. I see that in Sebastiaan's implementation, the get_builder method is allowed to receive extra inputs that are plugin specific. For instance, in the case of quantum espresso, the pseudo_family is passed explicitly to get_builder. I don't see any need for it and actually it seems to clash with the idea of having a common interface that is the same for each plugin. I think the agreement on the inputs was based on the minimal requirement a person wants to select in order to perform a relaxation, all the rest of the inputs should be suggest by us. If you are an expert user, you can always change the pseudo_family before submitting the builder. Moreover, how would you deal with these extra inputs when we want to produce a GUI? This is why, in my implementation, the selection of pseudos was done internally, according to the protocol. The actual implementation is a bit problematic and still I don't have the best solution, but I did it having in mind that in the future there might be a GUI that allows anyone to run a relaxation with just few clicks, even for people that don't know what a pseudopotential is.

sphuber commented 4 years ago

I see your point, the problem/reason I did it this way is twofold:

  1. Our workchains don't actually take a pseudo family name, in the form of a Str node, but rather have a pseudos namespaces, that directly takes the UpfData nodes mapped onto the relevant kind names of the structure. So overriding this directly on the returned builder is not easy for the user, because you have to manually change the pseudos, but then of course also the corresponding cutoffs in the parameters.
  2. The protocol generator will not just accept any pseudo family name. It actually needs to correspond to a group. Moreover, the functionality requires that the group has a certain structure and has an associated Dict node that contains the suggested cutoffs. This is a generic problem with pre-requisite resources that any code might need. VASP will also need POTCAR families to be installed.

I could remove the possibility of passing the pseudo family name to the input generator, but then this would require the environment to have installed the required data according to a very specific recipe, i.e. the Group with the UpfData nodes should have the exact names that the protocol knows about. I have built the aiida-sssp plugin that simplifies all of this, but it will mean that if people do not use that, the protocol and common workflow won't work. Maybe this will always be the case and so this restriction is not so bad.

BTW: how aiida-sssp works, is that it provides a new Group plugin, called SsspFamily that provides all the code around managing pseudopotentials. You can install a version with

aiida-sssp install

which will install the current recommended SSSP, which is currently v1.1 PBEsol efficiency. This creates an instance of SsspFamily with the label SSSP/1.1/PBE/efficiency. In the shell you can then do for example you can use

family = load_group('SSSP/1.1/PBE/efficiency')
pseudos = family.get_pseudos(structure=load_node())

which will return dictionary of kind names onto UpfData nodes.

You can also use get_cutoffs to automatically get recommended cutoffs:

family = load_group('SSSP/1.1/PBE/efficiency')
family.get_cutoffs('Si')   # (30., 240.)
family.get_cutoffs(['Si', 'Ge'])  # (50., 300.)
family.get_cutoffs(structure=load_node())   # pass a `StructureData
espenfl commented 4 years ago

So for VASP we somehow need to specify a family. We can of course query and choose one, but we have no idea of knowing the content of that is what we would consider good potentials. So either, if a VASP user should use these common workflows, we would either have to ask them to put "these potentials" into a family with name something and then just set this family name as a default on the wrapper. Or we would have to do like Sebastiaan is doing now.

It would be too complicated to query and check the potentials automatically at this point. It can be done, e.g. have a hash for each, and search each family and each potential in it, compare hash and hopefully come out with a potential family that is to the default requirements according to the hash. But I fear that is going to take a bit of fiddling and would maybe make it more complicated at this point.

We can document this in the wrapper so the former approach is fine and maybe the easiest for now, if we do not want to expose.

bosonie commented 4 years ago

This point seems to be more tricky than expected. I now see problems I couldn't see before. I think it all boils down to the fact that we are in front of two possible scenarios:

From my prospective, at the beginning, to maintain both scenarios was easy enough. In other words, it is feasible with Siesta because the protocol system is very rudimentary and to have installed the required data is not such a big problem. Now with your comments, it seems more tricky. At this point, I'm not sure were to go, I think we should ask the opinion of @giovannipizzi

espenfl commented 4 years ago

I also think this is tricky, but what we have with AiiDA now is at least a framework where we could try to do it. And it will not be perfect the first time.

We are happy to instruct VASP users to follow certain principles if they want to use the common workflows. Due to the licensing, we can unfortunately not bundle it with some AiiDA app that offers all software and potentials in a certain way (say like the Quantum Mobile). What we might be able to do is to provide a recipe, say Ansible, which can be included if the user has license and supplies source code and potentials.

I think we can find a solution to this. @sphuber maybe we can just settle on giving the family say a name like common_workflow_potentials or something like that and then we set this automatically in the wrapper.

greschd commented 4 years ago

Wouldn't it be possible, for the specific case of potentials, to create a shared list of potential families? Of course there are families (e.g. vasp-x.y) that can only be used by one code, but for others that are available in multiple formats, I think it would be useful to agree on common naming.

For the use case of running the workflow in the current AiiDA environment this might not be so useful because it's still the user's responsibility to name pseudo families when importing them, but for the GUI approach it might make more sense.

espenfl commented 4 years ago

Wouldn't it be possible, for the specific case of potentials, to create a shared list of potential families? Of course there are families (e.g. vasp-x.y) that can only be used by one code, but for others that are available in multiple formats, I think it would be useful to agree on common naming.

Not quite sure exactly what you mean here. I think this is what somehow suggested, e.g. common_workflow_potentials and then each plugin developer have to instruct in the documentation that this needs to be fixed before the common workflows can be used. Or did you think of something else?

greschd commented 4 years ago

Well, basically a long list of names (e.g. vasp-5.4-paw, sssp-1.1-efficiency, pslibrary-1.0.0-paw, and many more), to have a consistent way of how the potential families are named.

I'm not entirely sure it makes sense given users (at least currently) need to import and name pseudos manually, but it could enable e.g. implementing a verdi command that fetches and imports the pseudos, and names them consistently. For pseudos that are supported by more than one code you would then only have to switch the code, because the pseudo might be present in multiple formats (say, upf (QE), psp8 (Abinit), psml (Siesta)), but with the same name.

espenfl commented 4 years ago

Yes, the problem for VASP for instance is that it is not so easy to fetch them, unless we actually tell users that this set of potentials need to be there and fetchable by some name. For the codes, where the potentials are available and have a standardized name, this is a bit easier. Of course, what we could do on the VASP side is to standardize these names (i.e. on upload a name is suggested), but that is not so easy as many people tailor these and have families with additional potentials not supplied in the regular software etc. Also, we still have to require this specific potential to be there under this given name etc. What we call this is not important to us at this point, but we need to instruct users to make sure the name we choose in fact exists and also contains the potentials we recommend for running the common workflows.

Maybe as an initial approach, we leave it to the plugin developers to make sure this is set in the wrapper. And then the codes which share potentials can coordinate naming schemes etc?

giovannipizzi commented 4 years ago

What about some common interface also to prepare the data in the AiiDA DB? Say I have the QE implementation. This could have a setup_aiida_profile() method that will either automatically download the needed information (say, the pseudos), or (e.g. in the case of VASP) explain the user what they have to do to setup things (maybe saying "put the pseudos in this folder, and then call this function again", or having additional kwargs in the setup).

And the same function, or a different one (validate_aiida_profile()?) could check if everything is ready, or inform of what is missing.

So this could be the first step a person does before running, and if a person reports a workflow is not working, one could ask to run the validate function and report the output.

espenfl commented 4 years ago

@giovannipizzi Yes, this we could do to catch users that have not read the documentation, or want to get going quickly. Having some validation method is probably convenient anyway.

bosonie commented 4 years ago

Decision after discussion held on the 26th of May.

We allow plugin-specific inputs to get_builder(), BUT they must be “not-required” in an environment where all the necessary data are installed in the database with the correct recipe. This means that an aiida user running in his local environment can pass, for instance, its own custom name of pseudo_family, however, in a future platform where everything will be available, the get_builder should work perfectly without any pseudo_family passed to it. We also identified the QuantumMobile as the environment where we want to run, at least initially, the common workflows. Therefore each plugin developer should start to think about the steps its interface requires in order to set up in the QuantumMobile all the necessary data to run the relaxation workchain without the "not-required" inputs. This set of instructions will become an ansible set for QuantumMobile. With appropriate adjustments (attention to idempotency, check on the already present data, etc..) this set of instructions can become, in the future, the core of a setup_aiida_profile() function (eventually a verdi command) that a user can run in its environment to set all the necessary data with the correct recipe and forget about "not-required" inputs.

bosonie commented 4 years ago

Reported in the wiki, the set of instructions should be reported here