Ouranosinc / xscen

A climate change scenario-building analysis framework.
https://xscen.readthedocs.io/
Apache License 2.0
15 stars 2 forks source link

unstack_fill_nan doesn't work on rotated grid #156

Closed juliettelavoie closed 1 year ago

juliettelavoie commented 1 year ago

Setup Information

Description

I don't know if this is really a bug or a feature request. I have reference data on a rotated grid and I want to use it in my workflow. The dataset set has dimensions rlat and rlon. It also has lat and lon as coordinates. In my workflow, I stack the reference, extract simulation, regrid sim on ref and bias_adjust.

When I try to unstack, I run into an error (see below).

Steps To Reproduce

ds= xr.open_zarr('tasmax_day_eccc_rdrs-v2.1_NAM_1980_2018.zarr').isel(rlat=slice(190,198), rlon=slice(5,15), time=slice(0,10)) # cutting it just to make it faster
mask=ds.tasmax.isel(time=1, drop=True).notnull()
ds_stack = xs.utils.stack_drop_nans(ds,mask,to_file = 'coords_{domain}_{shape}.nc')
display(ds)
display(ds_stack)

Here is the raw ref data and the stack ref data:

<xarray.Dataset>
Dimensions:       (rlat: 8, rlon: 10, time: 10)
Coordinates:
    lat           (rlat, rlon) float32 dask.array<chunksize=(8, 10), meta=np.ndarray>
    lon           (rlat, rlon) float32 dask.array<chunksize=(8, 10), meta=np.ndarray>
  * rlat          (rlat) float32 -29.07 -28.98 -28.89 ... -28.62 -28.53 -28.44
  * rlon          (rlon) float32 325.1 325.1 325.2 325.3 ... 325.7 325.8 325.9
  * time          (time) datetime64[ns] 1980-01-01 1980-01-02 ... 1980-01-10
Data variables:
    rotated_pole  float32 ...
    tasmax        (time, rlat, rlon) float32 dask.array<chunksize=(10, 8, 10), meta=np.ndarray>

<xarray.Dataset>
Dimensions:       (loc: 80, time: 10)
Coordinates:
    lat           (loc) float32 dask.array<chunksize=(80,), meta=np.ndarray>
    lon           (loc) float32 dask.array<chunksize=(80,), meta=np.ndarray>
  * time          (time) datetime64[ns] 1980-01-01 1980-01-02 ... 1980-01-10
    rlat          (loc) float64 -29.07 -29.07 -29.07 ... -28.44 -28.44 -28.44
    rlon          (loc) float64 325.1 325.1 325.2 325.3 ... 325.7 325.8 325.9
Dimensions without coordinates: loc
Data variables:
    rotated_pole  (loc) float64 dask.array<chunksize=(80,), meta=np.ndarray>
    tasmax        (time, loc) float32 dask.array<chunksize=(10, 80), meta=np.ndarray>

The issue is here.

ds_unstack = xs.utils.unstack_fill_nan(ds_stack, coords='coords_{domain}_{shape}.nc')
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Input In [9], in <cell line: 14>()
     11 ds_stack = xs.utils.stack_drop_nans(ds,mask,to_file = 'coords_{domain}_{shape}.nc')
     12 print(ds_stack)
---> 14 ds_unstack = xs.utils.unstack_fill_nan(ds_stack, coords='coords_{domain}_{shape}.nc')
     15 display(ds_unstack)

File ~/Projets/xscen/xscen/config.py:214, in parse_config.<locals>._wrapper(*args, **kwargs)
    211 if CONFIG.get("print_it_all"):
    212     logger.debug(f"Modified kwargs : {kwargs}")
--> 214 return func(*args, **kwargs)

File ~/Projets/xscen/xscen/utils.py:223, in unstack_fill_nan(ds, dim, coords)
    221         logger.info(f"Dataset unstacked using {coords}.")
    222         coords = xr.open_dataset(coords)
--> 223     out = out.reindex(**coords.coords)
    225 for dim in dims:
    226     out[dim].attrs.update(ds[dim].attrs)

File /exec/jlavoie/.conda/xscen-dev/lib/python3.10/site-packages/xarray/core/dataset.py:3122, in Dataset.reindex(self, indexers, method, tolerance, copy, fill_value, **indexers_kwargs)
   2924 """Conform this object onto a new set of indexes, filling in
   2925 missing values with ``fill_value``. The default fill value is NaN.
   2926 
   (...)
   3119 
   3120 """
   3121 indexers = utils.either_dict_or_kwargs(indexers, indexers_kwargs, "reindex")
-> 3122 return alignment.reindex(
   3123     self,
   3124     indexers=indexers,
   3125     method=method,
   3126     tolerance=tolerance,
   3127     copy=copy,
   3128     fill_value=fill_value,
   3129 )

File /exec/jlavoie/.conda/xscen-dev/lib/python3.10/site-packages/xarray/core/alignment.py:878, in reindex(obj, indexers, method, tolerance, copy, fill_value, sparse, exclude_vars)
    863 """Re-index either a Dataset or a DataArray.
    864 
    865 Not public API.
    866 
    867 """
    869 # TODO: (benbovy - explicit indexes): uncomment?
    870 # --> from reindex docstrings: "any mis-matched dimension is simply ignored"
    871 # bad_keys = [k for k in indexers if k not in obj._indexes and k not in obj.dims]
   (...)
    875 #         "or unindexed dimension in the object to reindex"
    876 #     )
--> 878 aligner = Aligner(
    879     (obj,),
    880     indexes=indexers,
    881     method=method,
    882     tolerance=tolerance,
    883     copy=copy,
    884     fill_value=fill_value,
    885     sparse=sparse,
    886     exclude_vars=exclude_vars,
    887 )
    888 aligner.align()
    889 return aligner.results[0]

File /exec/jlavoie/.conda/xscen-dev/lib/python3.10/site-packages/xarray/core/alignment.py:176, in Aligner.__init__(self, objects, join, indexes, exclude_dims, exclude_vars, method, tolerance, copy, fill_value, sparse)
    174 if indexes is None:
    175     indexes = {}
--> 176 self.indexes, self.index_vars = self._normalize_indexes(indexes)
    178 self.all_indexes = {}
    179 self.all_index_vars = {}

File /exec/jlavoie/.conda/xscen-dev/lib/python3.10/site-packages/xarray/core/alignment.py:207, in Aligner._normalize_indexes(self, indexes)
    205 if not isinstance(idx, Index):
    206     if getattr(idx, "dims", (k,)) != (k,):
--> 207         raise ValueError(
    208             f"Indexer has dimensions {idx.dims} that are different "
    209             f"from that to be indexed along '{k}'"
    210         )
    211     data = as_compatible_data(idx)
    212     pd_idx = safe_cast_to_index(data)

ValueError: Indexer has dimensions ('rlat', 'rlon') that are different from that to be indexed along 'lat'

Additional context

If I drop lat and lon from ds the unstacking works, but I can't regrid my simulation (in lat and lon) to my ref (in loc with only rlat and rlon).

I am trying to modify stack_fill_nan to make it work, but I haven't figured it out yet. Any help would be welcome !

(I will do a PR if I can figure out a way to fix it in xscen, but I am also looking for a way to just fix it in my workflow.)

Contribution

juliettelavoie commented 1 year ago

I think I have a solution:

I think I can just do ds_ref_unstack = xs.utils.unstack_fill_nan(ds_ref_stack, coords=('rlat', 'rlon')). I think the point of the coords file is for nans and there is no real nans over the domain in the RDRS data.

On the QC domain, when I extract my region, I have nans but only outside of my region. After the unstack, I lose the data for lat and lon outside of my domain, but that doesn't matter.

# search
cat_ref = xs.search_data_catalogs(variables_and_freqs={'tasmax':'D'},
                               allow_resampling= False,
                               data_catalogs=['reconstruction-extra.json'],
            allow_conversion= True,
            periods=['1995','1996'],
            other_search_criteria={'source': 'RDRS'})

# extract
region_dict= {
          'name': 'QC-RSDS',
          'method': 'bbox',
          'bbox':{
            'lon_bnds': [ -83, -55 ],
            'lat_bnds': [ 42, 63 ] }}
dc = cat_ref.popitem()[1]
ds_ref = xs.extract_dataset(catalog=dc,
                         region=region_dict,
                         )['D']

mask=ds_ref.tasmax.isel(time=0, drop=True).notnull()
ds_ref_stack = xs.utils.stack_drop_nans(ds_ref,mask)
ds_ref_unstack = xs.utils.unstack_fill_nan(ds_ref_stack, coords=('rlat', 'rlon'))

diff= ds_ref_unstack.tasmax- ds_ref.tasmax
plt.figure()
diff.isel(time=0).plot()

diff_lat= ds_ref_unstack.lat- ds_ref.lat
plt.figure()
diff_lat.plot()

diff_lon= ds_ref_unstack.lat- ds_ref.lat
plt.figure()
diff_lon.plot()

image image image

I will test it on the complete workflow when the issue with the disk where I store my files is fixed.

juliettelavoie commented 1 year ago

Still no solution for using a file with stack_drop_nan and a rotated grid. But, the info-crue workflow works using coords=('rlat', 'rlon'), so I will close this issue.

juliettelavoie commented 1 year ago

We are adding NaNs to RDRS now, so this is an issue again.