aiidateam / aiida-tutorials

AiiDA tutorials web site
http://aiida-tutorials.readthedocs.org
23 stars 37 forks source link

Why using `get_builder_from_protocol` in AiiDA tutorials? #467

Open bastonero opened 1 year ago

bastonero commented 1 year ago

Hi everyone, I just had a quick look at the tutorials - they are great!

Nevertheless, I find it a bit confusing for first users to get started with the get_builder_from_protocols, as:

  1. The won't really learn how to prepare the inputs, and how to understand what and which are those inputs.
  2. Not every plugin in aiida has such functionality (actually, most of the plugins out there probably won't).

I guess this type of tutorial would be more appropriate for:

  1. aiida-quantumespresso, which doesn't have it, and one would look first into there.
  2. Again aiida-qe, but more for customizing in a section of ~How to get your life easier.

One of the questions I get more often are:

  1. Where are the inputs?
  2. How can I know them?
  3. What is this workflow doing?

So I think it would be better to show in this tutorial how to tackle these questions first, otherwise people get immediately stuck for other workflows, for which btw there is still very little documentation.

Hope this will help to make the docs better for new users (:

Cheers!

mbercx commented 1 year ago

Hi @bastonero! Thanks for the feedback. It's definitely fair that some of the current tutorials are still very aiida-quantumespresso-centric, and we maybe should move parts of it to the documentation of that plugin package. It's been "on the agenda" for some time, in fact. ^^

A couple of comments:

  1. The Running calculations does show how to prepare inputs for a calculation step by step. In a sense doing the same for workflows is just an extension of this, but it does get quite complicated admittedly (which is why we developed the get_builder_from_protocol() method in the first place ^^).
  2. We've already discussed that for aiida-quantumespresso, we want to make the get_builder_from_protocol() method the recommended approach for users to run the workflows, in combination with either overrides or adapting the builder after obtaining it (see the OP and discussion in https://github.com/aiidateam/aiida-quantumespresso/pull/902).

Some thoughts on the common questions:

As in where are they defined? The protocol files, unless an inputs is updated dynamically at runtime (e.g. error handler). Of course, the protocol files are merged throughout the whole stack of processes of a work chain, so I understand it's not always trivial to figure out where an input comes from if you're not very familiar with a work chain. At least according to the principles in https://github.com/aiidateam/aiida-quantumespresso/pull/902 they now aren't specified in the work chain logic as well.

Easiest is to simply return the builder in the Python REPL. That now gives a quite good overview of all the inputs, although some (e.g. pseudo files) can still be improved.

I think adding a better description of the work chains (to the class docstring?) is probably a good idea. But this of course depends heavily on the plugin developer. Or we can think of some other approach that uses the outline and the docstrings of each method in the outline? 🤔

bastonero commented 1 year ago

Thanks @mbercx for the reply!

Easiest is to simply return the builder in the Python REPL. That now gives a quite good overview of all the inputs, although some (e.g. pseudo files) can still be improved.

Indeed I think this would be super great to show, so that one can just load the Workflow/Calculation and print its input structure. Currentlly I have the feeling that a new user wouldn't know how to know extra inputs, especially if it is not either a QE or VASP related plugin. (of course the best way would be to consult the specific plugin docs, but having an idea from the terminal I guess would be helpful)

Do you think this could be an improvement?

I think adding a better description of the work chains (to the class docstring?) is probably a good idea. But this of course depends heavily on the plugin developer.

Absolutely. But returning the builder already prints its docstring?

mbercx commented 1 year ago

Do you think this could be an improvement?

For sure! I'd show that feature in the tutorials. Looking at an empty builder of the PwBandsWorkChain:

In [1]: from aiida_quantumespresso.workflows.pw.bands import PwBandsWorkChain

In [2]: PwBandsWorkChain.get_builder()
Out[2]: 
Process class: PwBandsWorkChain
Inputs:
bands:
  metadata: {}
  pw:
    metadata:
      options:
        stash: {}
    monitors: {}
    pseudos: {}
metadata: {}
relax:
  base:
    metadata: {}
    pw:
      metadata:
        options:
          stash: {}
      monitors: {}
      pseudos: {}
  base_final_scf:
    metadata: {}
    pw:
      metadata:
        options:
          stash: {}
      monitors: {}
      pseudos: {}
  metadata: {}
scf:
  metadata: {}
  pw:
    metadata:
      options:
        stash: {}
    monitors: {}
    pseudos: {}

We can see some structure of the inputs, but e.g. it might not be immediately obvious where the parameters should go. We can probably also develop something that would show the entire possible input stack, but that'd be rather massive and intimidating (so many options! 😬).

Absolutely. But returning the builder already prints its docstring?

Not at the moment, and I also wouldn't add it to the __repr__ of the process classes. Especially if we want to be more descriptive of what a work chain does in the docstring. You can also just see the docstring anyways with ?:

In [3]: PwBandsWorkChain?
Init signature: PwBandsWorkChain(*args: Any, **kwargs: Any) -> 'StateMachine'
Docstring:     
Workchain to compute a band structure for a given structure using Quantum ESPRESSO pw.x.

The logic for the computation of various parameters for the BANDS step is as follows:

Number of bands:
    One can specify the number of bands to be used in the BANDS step either directly through the input parameters
    `bands.pw.parameters.SYSTEM.nbnd` or through `nbands_factor`. Note that specifying both is not allowed. When
    neither is specified nothing will be set by the work chain and the default of Quantum ESPRESSO will end up being
    used. If the `nbands_factor` is specified the maximum value of the following values will be used:

    * `nbnd` of the preceding SCF calculation
    * 0.5 * nelectrons * nbands_factor
    * 0.5 * nelectrons + 4

Kpoints:
    There are three options; specify either an existing `KpointsData` through `bands_kpoints`, or specify the
    `bands_kpoint_distance`, or specify neither. For the former those exact kpoints will be used for the BANDS step.
    In the two other cases, the structure will first be normalized using SeekPath and the path along high-symmetry
    k-points will be generated on that structure. The distance between kpoints for the path will be equal to that
    of `bands_kpoints_distance` or the SeekPath default if not specified.
Init docstring:
Construct a WorkChain instance.

Construct the instance only if it is a sub class of `WorkChain`, otherwise raise `InvalidOperation`.

:param inputs: work chain inputs
:param logger: aiida logger
:param runner: work chain runner
:param enable_persistence: whether to persist this work chain
File:           ~/envs/aiida-super/code/aiida-quantumespresso/src/aiida_quantumespresso/workflows/pw/bands.py
Type:           Protect
Subclasses:

Note that with the nice new AiiDA registry developed by @AhmedBasem20 you can also see the docstring/inputs/outputs etc for every work chain:

https://aiidateam.github.io/aiida-registry/aiida-quantumespresso#aiida.workflows.quantumespresso.pw.bands

Which is of course also information you can obtain from verdi plugin list aiida.workflows quantumespresso.pw.bands.

bastonero commented 1 year ago

Sweet, didn't notice, that's great! I then think I can close this PR (my bad :D).

mbercx commented 1 year ago

@bastonero not at all! I'm going to reopen with the following actionable points:

In the section on submitting the work chain: