Ouranosinc / miranda

A modern Python utility library for climate data collection and management
Apache License 2.0
18 stars 2 forks source link

Investigate Xmip for preprocessing CMIP6 data prior to data treatment #138

Open Zeitsperre opened 1 year ago

Zeitsperre commented 1 year ago

Proposal

CMIP6 data sometimes requires additional cleaning or treatment to remove known issues with the data (e.g. extra weeks of data, specific errors in values/metadata, inconsistent naming of coordinates, etc.). Issues in our existing data stores of CMIP6 data are difficult to track, annoying to correct, and Miranda's existing data cleaning approach is ill-suited for handling these sparse errors.

While other tools should be explored for collecting CMIP6 data (such as esgpull), we shouldn't be trying to remake the wheel, especially for a project as large and well-supported like CMIP6.

Approach

Xmip should be leveraged for this step. This could be built into Miranda as another method or submodule specifically for preprocessing (miranda.preprocessing.cmip?).

Xmip provides a post-processing module that might be of interest to xscen for building scenarios. To be determined.

juliettelavoie commented 1 year ago

Definitely a lot of interesting features in xmip! I think a lot of the hard coded issues and fixes in pre-processing are for oceanography, so not variables/experiment that we use often. But, it makes sense to contribute to xmip and have miranda wrap it instead of doing it separatly directly in miranda.

For the post-processing, I think we already solve of a lot of the combination problems with extract_dataset and .to_dataset. I'm not convinced we should add it to xscen until we really need it. We also already handle grids using xesmf.