We should move towards a system that no longer has "options" available for data, and instead uses remote URLs that get cached locally. Motivations for this change are listed below.
This involves a few changes:
Replace get_default_config() with get_config(location) which accepts a remote file location. Initially, get_default_config should remain in place but should present a DeprecationWarning. It would return a dictionary that uses remote paths instead of "options".
Remove initial_conditions and forcing keys from the configuration dictionary.
Rename patch_files to file_sources.
Allow not just asset dictionaries but also strings in file_sources. A string should represent a path (either file or directory) which is copied into the run directory recursively. These would be replaced by asset dictionaries internally. To place files in subdirectories, users should either place the files under a subdirectory in the source location, or use the more extensive asset dict representation (@oliverwm1 has found for loops generating asset dicts to be a very smooth workflow).
Add keys orographic_data and field_tables to the config dict. These locations should be directories which have the same structure as the current cache locations. Orographic data should be placed in resolution subfolders and field_tables should be labelled by scheme.
Remove the data_table key, treat it as a forcing file.
Treat "default" as a filename instead of option for diag_table.
A sample configuration dictionary (excluding the namelist) might look like the following yaml (note this is a mockup and doesn't represent a valid run directory):
@oliverwm1 has found it useful to use the patch_files feature to specify all of the input data. He'd like to be able to disable the "initial_conditions" option by setting it to "None", but this seems a little hack-ish. This stems from the fact that initial conditions has two ways it can be provided (patch_files or initial_conditions).
@nbren12 has pointed out that it is not ideal to have strings behave as file paths under some conditions or lookup keys under other conditions
In moving to the new fv3atm repo, we found changes are necessary in the forcing data structure and in the model configuration. I have opted to use remote paths instead of change/add new "options" for data, alongside automatic caching offered by #45.
We are also looking soon at releasing a "for public use" docker implementation of the model. The current set-up with built-in options is not very publicly digestable, because it is unclear what a "default" configuration should be, and this "default" configuration is also model version dependent and likely to break. The "option" method doesn't have great ways to version the data.
We should move towards a system that no longer has "options" available for data, and instead uses remote URLs that get cached locally. Motivations for this change are listed below.
This involves a few changes:
get_default_config()
withget_config(location)
which accepts a remote file location. Initially, get_default_config should remain in place but should present a DeprecationWarning. It would return a dictionary that uses remote paths instead of "options".initial_conditions
andforcing
keys from the configuration dictionary.patch_files
tofile_sources
.file_sources
. A string should represent a path (either file or directory) which is copied into the run directory recursively. These would be replaced by asset dictionaries internally. To place files in subdirectories, users should either place the files under a subdirectory in the source location, or use the more extensive asset dict representation (@oliverwm1 has found for loops generating asset dicts to be a very smooth workflow).orographic_data
andfield_tables
to the config dict. These locations should be directories which have the same structure as the current cache locations. Orographic data should be placed in resolution subfolders and field_tables should be labelled by scheme.data_table
key, treat it as a forcing file.diag_table
.A sample configuration dictionary (excluding the namelist) might look like the following yaml (note this is a mockup and doesn't represent a valid run directory):
Motivations:
patch_files
feature to specify all of the input data. He'd like to be able to disable the "initial_conditions" option by setting it to "None", but this seems a little hack-ish. This stems from the fact that initial conditions has two ways it can be provided (patch_files
orinitial_conditions
).