ai2cm / fv3config

Manipulate FV3GFS run directories
Apache License 2.0
1 stars 0 forks source link

Allow per-file specification of initial conditions and forcing files #25

Closed oliverwm1 closed 4 years ago

oliverwm1 commented 4 years ago

The initial_conditions and forcing options in the config dictionary are currently specified as paths to directories. We would like the user to be able to specify the files that make up the initial conditions or forcing files on a per-file basis.

This could be done by allowing these options to be dicts or lists of tuples which include the source and destination path for each file, as well as information about whether to copy or symlink that particular file.

mcgibbon commented 4 years ago

Sounds great! However we use "filename", could we represent it as a pattern that could match multiple targets? That can greatly reduce the number of entries we need, since a lot of them share the same pattern

We could also use this feature to specify a scheme for renaming the files, based on @nbren12's idea for doing so.

e.g.

{
    'location': 'gs://vcm-ml-data/date-description/',
    'name': 'sfc_data_mycase.tile*.mytimestamp',  # e.g. "*" = "#.####.nc"
    'target_name': 'sfc_data.tile*',  # auto-fill from "name" wildcards
    'target_type': 'copy', # could be 'symlink', 'hardlink'
    ...  # whatever other options I'm not thinking of
}

In the above way of doing it, anything matched by the n'th "*" in name would be inserted into the n'th "*" in target_name. You'd also still have the option to use a bunch more entries to do it one file at a time. You might have ideas on more sophisticated things than just "*", I'm personally not familiar with what bash or gcloud's wildcards for filenames look like.