GCM lons in weights - Githubissues

orianac commented 2 years ago

In the downscaling workflows we open the GCM zarr stores and adjust the lons to a [-180,180] range using a postprocess call (https://github.com/carbonplan/cmip6-downscaling/blob/4a41f4d3c99b47127665fdeafa957255080a8825/cmip6_downscaling/data/cmip.py#L120). The weights generation flow opens GCMs directly without shifting any lons. This means that when we apply those weights they are shifted from the datasets we're applying them to. In other words, an issue arises if the dataset you're applying the weights file to has lons that differ from the dataset used to create the weights file.

I think the solution is to add the postprocess call to the weights generation routine -my guess is right after L67 would work (https://github.com/carbonplan/cmip6-downscaling/blob/4a41f4d3c99b47127665fdeafa957255080a8825/flows/gcm_obs_weights.py#L67).

@andersy005 can you implement this fix? Also, might this be relevant for the pyramid generation steps since I believe they also use weights? In the event that these weights (or other weights with a similar mismatch issue) are used in other places we should check all of them. I checked the ERA5 step and it appears we use the same utility to open ERA5 in the weights generation as in the workflows (which is relevant since we adjust the lat ordering at https://github.com/carbonplan/cmip6-downscaling/blob/4a41f4d3c99b47127665fdeafa957255080a8825/cmip6_downscaling/data/observations.py#L45). But in case there is any other place we open ERA5 for the creation of pyramids, as long as we're using weights we need to have that same lat reordering implemented as well.

andersy005 commented 2 years ago

Good catch, @orianac! yes, this is relevant for the pyramid generation.

can you implement this fix?

i'm on it

andersy005 commented 2 years ago

In the event that these weights (or other weights with a similar mismatch issue) are used in other places we should check all of them.

it appears https://github.com/carbonplan/cmip6-downscaling/blob/4a41f4d3c99b47127665fdeafa957255080a8825/cmip6_downscaling/methods/bcsd/flow.py#L144

is using the pre-generated weights: https://github.com/carbonplan/cmip6-downscaling/blob/4a41f4d3c99b47127665fdeafa957255080a8825/cmip6_downscaling/methods/common/tasks.py#L335

i presume we'll need to re-generate the pyramids for the bcsd runs.

Cc @norlandrhagen

andersy005 commented 2 years ago

I just ran into an interesting issue. our postprocess() function assumes that we are dealing with GCMs on regular lat/lon grids. However, some of the GCMs use unstructured grids, for e.g.:

MPI-M/ICON-ESM-LR

```python In [1]: import xarray as xr In [2]: path = 'az://cmip6/CMIP/MPI-M/ICON-ESM-LR/historical/r1i1p1f1/day/pr/gn/v20210215/' In [3]: from cmip6_downscaling.data.cmip import postprocess In [4]: ds = xr.open_zarr(path) In [5]: ds Out[5]: Dimensions: (i: 20480, time: 60265, bnds: 2, vertices: 3) Coordinates: * i (i) int32 0 1 2 3 4 5 ... 20475 20476 20477 20478 20479 latitude (i) float64 dask.array longitude (i) float64 dask.array * time (time) datetime64[ns] 1850-01-01T12:00:00 ... 2014-12... time_bnds (time, bnds) datetime64[ns] dask.array Dimensions without coordinates: bnds, vertices Data variables: pr (time, i) float32 dask.array vertices_latitude (i, vertices) float64 dask.array vertices_longitude (i, vertices) float64 dask.array Attributes: (12/53) CDI_grid_type: unstructured CDO: Climate Data Operators version 2.0.0rc5 (https://... Conventions: CF-1.7 CMIP-6.2 activity_id: CMIP branch_method: standard branch_time_in_child: 0.0 ... ... table_info: Creation Date:(09 May 2019) MD5:5f007c16960eee824... title: ICON-ESM-LR output prepared for CMIP6 tracking_id: hdl:21.14100/12a736ea-2ab6-43aa-8bb5-616c9b191b20 variable_id: pr variant_label: r1i1p1f1 version_id: v20210215 In [6]: postprocess(ds) --------------------------------------------------------------------------- KeyError Traceback (most recent call last) File /srv/conda/envs/notebook/lib/python3.9/site-packages/xarray/core/dataset.py:1394, in Dataset._construct_dataarray(self, name) 1393 try: -> 1394 variable = self._variables[name] 1395 except KeyError: KeyError: 'lon' During handling of the above exception, another exception occurred: KeyError Traceback (most recent call last) Input In [6], in () ----> 1 postprocess(ds) File ~/devel/carbonplan/cmip6-downscaling/cmip6_downscaling/data/cmip.py:49, in postprocess(ds) 46 ds = ds.squeeze(drop=True) 48 # standardize longitude convention ---> 49 ds = lon_to_180(ds) 51 # Reorders latitudes to [-90, 90] 52 if ds.lat[0] > ds.lat[-1]: File ~/devel/carbonplan/cmip6-downscaling/cmip6_downscaling/data/utils.py:71, in lon_to_180(ds) 52 '''Converts longitude values to (-180, 180) 53 54 Parameters (...) 66 cmip6_preprocessing.preprocessing.correct_lon 67 ''' 69 ds = ds.copy() ---> 71 lon = ds["lon"].where(ds["lon"] < 180, ds["lon"] - 360) 72 ds = ds.assign_coords(lon=lon) 74 if not (ds["lon"].diff(dim="lon") > 0).all(): File /srv/conda/envs/notebook/lib/python3.9/site-packages/xarray/core/dataset.py:1498, in Dataset.__getitem__(self, key) 1495 return self.isel(**cast(Mapping, key)) 1497 if hashable(key): -> 1498 return self._construct_dataarray(key) 1499 else: 1500 return self._copy_listed(key) File /srv/conda/envs/notebook/lib/python3.9/site-packages/xarray/core/dataset.py:1396, in Dataset._construct_dataarray(self, name) 1394 variable = self._variables[name] 1395 except KeyError: -> 1396 _, name, variable = _get_virtual_variable( 1397 self._variables, name, self._level_coords, self.dims 1398 ) 1400 needed_dims = set(variable.dims) 1402 coords: dict[Hashable, Variable] = {} File /srv/conda/envs/notebook/lib/python3.9/site-packages/xarray/core/dataset.py:169, in _get_virtual_variable(variables, key, level_vars, dim_sizes) 167 ref_var = dim_var.to_index_variable().get_level_variable(ref_name) 168 else: --> 169 ref_var = variables[ref_name] 171 if var_name is None: 172 virtual_var = ref_var KeyError: 'lon' ```

Are these GCMs with unstructured grids excluded from the list of GCMs we are downscaling?

Cc @jhamman

jhamman commented 2 years ago

Are these GCMs with unstructured grids excluded from the list of GCMs we are downscaling?

Yes, let's skip this model for now.

andersy005 commented 2 years ago

Okie dokie... i have a prefect flow running for these models ["MIROC6", "AWI-CM-1-1-M", "BCC-CSM2-MR"]

andersy005 commented 2 years ago

The weights for all the models on regular lat/lon grids have been updated.

https://cmip6downscaling.blob.core.windows.net/static/xesmf_weights/cmip6_pyramids/weights.csv

carbonplan / cmip6-downscaling

GCM lons in weights #183