C2SM / Sirocco

AiiDA based Weather and climate workflow tool
1 stars 0 forks source link

Specifying global inheritance of inputs and outputs in config file to reuse them in multiple cycle-tasks #26

Closed agoscinski closed 1 week ago

agoscinski commented 1 month ago

Objective

To conveniently use tasks multiple times in the cycles without respecifying all the inputs in each cycle-task. We need some option to inherit inputs from a base definition (similar idea of the root task that all tasks inherit from). This is also later important for the namelist as we want to specify it only once.

Proposition

The inputs that do not change over the workflow can then be put into the task

tasks:
- icon_task:
    plugin: icon
    code: icon_code
    computer: localhost
    # global inputs that are inherited by each cycle task
    # and can be overwritten in the cycle task
    inputs:
    - grid:
        port_name: icon_grid_simple.nc
    - icon_restart:
        lag: -2PM
        port_name: restart_file   # needs still implementation on aiida-icon side
    outputs:
    - icon_restart:
        port_name: latest_restart_file

The inheritance from tasks to cycle-task would be a union, since otherwise you need to specify all inputs again if you only want to change one. One problem that follows from this is how to overwrite inputs. An existing input is only overwritten when it is respecified in the cycle-task. What is a bit weird is to overwrite the port usage in the example above, one has to do

cycles:
- icon_cycle:
    period: 3PM
    tasks:
    - icon_task:
        - custom_grid:
            port_name: icon_grid_simple.nc

So this overwrites the usage of grid in this icon_task instance. On the other hand to overwrite an option one would need to respecify the data node name

cycles:
- icon_cycle:
    period: 3PM
    tasks:
    - icon_task:
        - grid:
            lag: -2PM

We had also the suggestion to use the port_names as key name, but this does not work for the data section which result in inconsitencies somewhere else. If one would choose this way we would have such files

cycles:
- icon_cycle:
    period: 3PM
    tasks:
    - icon_task:
        - icon_grid_simple.nc
            data: grid
...
data:
- grid:
    type: file
    src: $PWD/tests/files/simple_icon_run/inputs/icon_grid_simple.nc

I don't have an idea that resolves this nicely. Personally I am right now tending to the last option, but no strong tendency.

leclairm commented 4 weeks ago

I don't really see the problem here. There are 2 places where task specifications are given. Either in the tasks section or in the cycles. The later hosts everything that is needed to build the graph, the former all the rest. And anyhow any of these pieces of information is specified only once, be it in the tasks section or the cycles, typically the inputs of a task. I might be missing something but I don't get where we could use an inheritance mechanism.

agoscinski commented 4 weeks ago

We had a zoom discussion on this point. Here is the summary on this point:

We decided to not allow inputs and outputs in the task definition for now. As this was mainly brought up because of the namelist and the namelist is now part of the config option of the tasks, we don't need inputs and outputs as part of the task definition.