CUPiD is a “one stop shop” that enables and integrates timeseries file generation, data standardization, diagnostics, and metrics from all CESM components.
[x] Have you followed the guidelines in our Contributor's Guide (including the pre-commit check)?
[x] Have you checked to ensure there aren't other open Pull Requests for the same update/change?
New Feature Submissions:
[x] Does your submission pass tests?
[x] Have you lint your code locally prior to submission?
Changes to Core Features:
[x] Have you added an explanation of what your changes do and why you'd like us to include them?
[x] Have you successfully tested your changes locally?
Commentary
GenTS is a modernized post-processing package that specializes in converting history files to timeseries files. All code changes are made to run.py as timeseries.py isn't needed (all of the functionality is encapsulated in GenTS). Release versions are made available via PyPI, so I opted to add it to the environment dependencies list rather than git fleximod or externals.
More testing is required for GenTS to make sure it is post-processing history files correctly and integrating into CUPiD will promote further testing. If CUPiD is nearing the production stage, then it might make more sense to create a separate branch.
GenTS can run in serial, but can run in parallel by utilizing Dask. Following the other notebooks, I create a local cluster unless serial is specified in the config.yml.
There are some timeseries specifications in the config.yml that don't yet exist in GenTS. For example, GenTS allows the user to specify a time slice, but this is generalized to all of the history files stored within a ModelOutputDatabase and cannot be broken into model components. To get around this, I create a unique ModelOutputDatabase for each model component, which is likely inefficient. GenTS does not require a history string to identify history files, but this comes with the caveat of processing all history files within the output directory. This may not be ideal for some use cases. I am unsure whether to implement these features into CUPiD or GenTS, as I would prefer to keep GenTS as generalized as possible but there may be some useful tools I could build into GenTS to enable this sort of configuration.
All Submissions:
pre-commit
check)?New Feature Submissions:
Changes to Core Features:
Commentary
GenTS is a modernized post-processing package that specializes in converting history files to timeseries files. All code changes are made to
run.py
astimeseries.py
isn't needed (all of the functionality is encapsulated in GenTS). Release versions are made available via PyPI, so I opted to add it to the environment dependencies list rather than git fleximod or externals.More testing is required for GenTS to make sure it is post-processing history files correctly and integrating into CUPiD will promote further testing. If CUPiD is nearing the production stage, then it might make more sense to create a separate branch.
GenTS can run in serial, but can run in parallel by utilizing Dask. Following the other notebooks, I create a local cluster unless serial is specified in the
config.yml
.There are some timeseries specifications in the
config.yml
that don't yet exist in GenTS. For example, GenTS allows the user to specify a time slice, but this is generalized to all of the history files stored within aModelOutputDatabase
and cannot be broken into model components. To get around this, I create a uniqueModelOutputDatabase
for each model component, which is likely inefficient. GenTS does not require a history string to identify history files, but this comes with the caveat of processing all history files within the output directory. This may not be ideal for some use cases. I am unsure whether to implement these features into CUPiD or GenTS, as I would prefer to keep GenTS as generalized as possible but there may be some useful tools I could build into GenTS to enable this sort of configuration.