C2SM / Sirocco

AiiDA based Weather and climate workflow tool
1 stars 0 forks source link

Specify when input data specs should be taken into account #40

Open leclairm opened 1 week ago

leclairm commented 1 week ago

The problem

For now we don't have a way to express in the yaml file when the input data spec or the wait on tasks should be taken into account. The only workaround is that if an input data target date falls outside of the date range of the specified data, it's ignored. This allows for the restart mechanism where the first cycle doesn't use a restart file. Conversely, the initial cycle can need an input file that no other cycle uses but we cannot express that so far.

Proposition

Add a when keyword that signifies this input data (or wait on task) spec applies when this condition is met. I would use 3 sub-keywords for that: before, after and at that refer to the current cycle date.

cycles:
  - bimonthly_tasks:
      start_date: *root_start_date
      end_date: *root_end_date
      period: P2M
      tasks:
        - icon:
            inputs:
              - initial_conditions:
                  when:
                    at: *root_start_date
              - icon_restart:
                  lag: -P2M
                  when: 
                    after: *root_start_date
            outputs: [icon_output, icon_restart]

A variation of that could be to skip the when keyword and directly have the before, after and at at the same level as lag and date:

cycles:
  - bimonthly_tasks:
      start_date: *root_start_date
      end_date: *root_end_date
      period: P2M
      tasks:
        - icon:
            inputs:
              - initial_conditions:
                  at: *root_start_date
              - icon_restart:
                  lag: -P2M
                  after: *root_start_date
            outputs: [icon_output, icon_restart]

I find it confusing this way because we don't know if these conditions are relative to the cycle date or the data date.

Side benefit

The sirocco.core.TimeSeries class would be obsolete. It was designed to allow for the date bounds checking but this would now be explicitly handled by the user so we can just store items in the internal dict which would raise the KeyError if necessary.