C2SM / Sirocco

AiiDA based Weather and climate workflow tool
1 stars 0 forks source link

Support of generic aiida plugins #20

Open agoscinski opened 1 month ago

agoscinski commented 1 month ago

Idea

AiiDA plugins define their inputs and outputs in their CalcJobs and Ẁorkchains with specific names. For example the arithmetic add CalcJob) has the inputs x and y as well as the output sum. We therefore need to specify these ports (how AiiDA calls them) in the yaml file to create the workgraph. In the aiida-shell plugin we did not need to do this because Each plugin defines a entry point which we can use to load the corresponding CalcJob or WorkChain using the factories

from aiida.plugins.factories import CalculationFactory
ArithmeticAddCalculation = CalculationFactory("core.arithmetic.add")
# Retrieve input ports
print(ArithmeticAddCalculation.spec().inputs)

So with these two additional information (the entry point and the port names )in the YAML file we can run almost arbitrary calculations from aiida plugins (including aiida-icon). The reason why we did not need the port names for aiida-shell is because ShellJob creates dynamically its output ports from the outputs that are provided as inputs, so we took this to our advantage and use the name specified in the yaml file as output port names. For the input ports we also simplify the actual ports that would be nodes and arguments (see code). The gist is that we treat aiida-shell differently, and we should continue to do so, because otherwise it becomes cumbersome to use.

YAML syntax

Here you find (an example to run arithmetic add)[https://github.com/C2SM/ETHIOPIA/blob/plugins/tests/files/configs/test_config_small.yml]. A snippet of it to show how it is used to define a workflow.

- adder1:
    inputs:
        - a:
            port_name: x
        - b:
            port_name: y
    outputs:
        - sum1: 
            port_name: sum
- adder2:
    inputs:
        - sum1:
            port_name: x
        - c:
            port_name: y
    outputs:
        - sum2:
            port_name: sum

Since the same data object can be used for different ports we need this information in the cycles.

Definition of computer and code

We follow more the aiida logic to define computer and code information by just specifying the label given on definition.

tasks:
  - adder1:
      plugin: core.arithmetic.add
      code: bash 
      computer: localhost

This has the strong advantage that we do not have to write our own logic to parse all the computer information and can use the well maintained CLI verdi from aiida to allow the user to create it before. It is in this PR because it was required for testing, but should be separated out in a different PR

Current state of the code

Currently the code in the workgraph.py using different functions to create plugins that are not ShellJobs, and I am not sure if this is smart or not. It is a tradeoff between code duplications and flexibility, and requires a bit more thoughts and decisions how we go with this.

agoscinski commented 1 month ago

Icon change of namelist

For icon we need to adapt the behavior, because we rather want to change the namelist and keeping it over the calculation constant, so we can move it to the task definition. We can maybe make a calcjob out of the calcfunction https://github.com/aiida-icon/aiida-icon/blob/a982d8792006bf234fe79c18aa76fd2af7a3463f/src/aiida_icon/iconutils/masternml.py#L43-L51 that adapts the name list so we can provide a simpler use for the user, for arbitrary changes. We will use then this calcjob also to adapt it for the inputs we can infer from the workflow (date, output of last icon last run will not be passed in the aiida way but just calls this calcjob to update the namelist with the new file).

Naming of port_name

The port_name, maybe rename to input_key or input_slot.

Specify computer and code

We discussed how we deal with computer and code. For computer definition we stick with verdi, but for codes it might be useful to just just pass the filepath since we want not that the user creates a new code all the time when icon is recompiled. How to create a label in this case is still an open question. We could hashing the binary but for icon this can be 200MB which needs to be send over the transport plugin. One proposition was to hash the filepath as label, it was an open question if we generate a new uuid that preserve provenance also in cases the code is recompiled.

leclairm commented 1 month ago

We will use then this calcjob also to adapt it for the inputs we can infer from the workflow (date, output of last icon last run will not be passed in the aiida way but just calls this calcjob to update the namelist with the new file).

Here, for input data, we have to choices: either we adapt the namelist with the valid absolute path to the corresponding data or we leave a constant relative path in the namelist and symlink the actual data to the correct relative path in the working directory of the job.