plugin design - output parameters

aiidateam / aiida-core

The official repository for the AiiDA code

https://aiida-core.readthedocs.io

Other

436 stars 190 forks source link

plugin design - output parameters #1454

Closed ltalirz closed 6 years ago

ltalirz commented 6 years ago

This is a basic question about plugin design.

I have a binary (zeo++) that produces several small output files, some of which I will parse, and some of which I won't. Which output files are produced, depends on the input. Currently, the aiida graph looks like this:

graph

There needs to be some way of differentiating the outputs (say, in the provenance browser). Here I am naming the links. Is the way to go, or should one rather name the resulting nodes? (e.g. by creating subclasses for every type of node)
Looking at the code below, there seems to be the idea that a calculation should have a single dictionary of "parsed results". https://github.com/aiidateam/aiida_core/blob/7086bb8215844b632fae274a81ff7d4b394f1c78/aiida/orm/implementation/general/calculation/job/__init__.py#L1715-L1726 I'm not sure whether it is a good idea to merge the results of parsing the individual output files into a single file dictionary/ParameterData... any suggestions?

mentioning @giovannipizzi

DropD commented 6 years ago

Naming the links is the way to go, the linkname you choose will be a way to access the node as in
```
output_node = calc.out.<linkname>
```
The Idea, as far as I know with the res is a shortcut access to the most often used resulting scalar (or small vector / matrix) quantities. In QE / VASP, this might for example be the fermi energy. Access to those quantities is then simplified to calc.res.efermi or similar. It is a good idea to parse such quantities into one single ParameterData node. It would be a bad idea to do so for large arrays / matrices, since they would get stored in a very subobtimal way (ArrayData does a better job).

If you have no useful scalar / low dimensional output consider putting things like total runtime as reported by the program, memory used, etc into output_parameters

giovannipizzi commented 6 years ago

I agreee with @DropD As additional comments:

If you just have output info you want to group, use different ParameterData nodes and use link names. If instead the data entities are reusable as inputs by calculations, and you can define some sort of "ontology", better to define a datatype (e.g. a crystal structure, a band structure, a trajectory, ...)
the output_parameters default one is needed (actually, it's optional, but handy) just to use the calc.res.<TAB> shortcut. as @DropD said, just put in there some global summary/warnings/..., and keep the rest of the info in independent ParameterData nodes, if this helps for your use case.

Of course, another option (but this depends on the code) is to put all parsed info inside the same output_parameters, explaining in the docs that some parts are optional.

ltalirz commented 6 years ago

Thanks for the info, that answers the question.