ecmwf / anemoi-training

Apache License 2.0
17 stars 17 forks source link

Date-agnostic way to define training/validation split of dataset #142

Open cathalobrien opened 5 days ago

cathalobrien commented 5 days ago

Is your feature request related to a problem? Please describe.

Currently I define my training/validation split like so

  training:
    start: null
    end: 2020
  validation:
    start: 2021
    end: 2021

This breaks when i change to a different dataset which covers a different date range. Then I am forced to use my brain to remember the syntax and think up a different set of dates that fall within the new dataset. When all I really want is an 80% training 20% validation split.

Describe the solution you'd like

It would be nice if there was a way to select fractions of the dataset, without having to mention dates. This would make the same config portable to datasets covering different date ranges. An example of an 80% training 20% validation split is below, where I give fractions as floats (and ideally Anemoi will work out the date ranges itself).

  training:
    start: 0.0
    end: 0.8
  validation:
    start: 0.9
    end: 1.0

Describe alternatives you've considered

No response

Additional context

No response

Organisation

ECMWF