unstack_fill_nan doesn't work on rotated grid #156

Closed juliettelavoie closed 1 year ago

juliettelavoie commented 1 year ago

I don't know if this is really a bug or a feature request. I have reference data on a rotated grid and I want to use it in my workflow. The dataset set has dimensions rlat and rlon. It also has lat and lon as coordinates. In my workflow, I stack the reference, extract simulation, regrid sim on ref and bias_adjust.

When I try to unstack, I run into an error (see below).

Steps To Reproduce

ds= xr.open_zarr('tasmax_day_eccc_rdrs-v2.1_NAM_1980_2018.zarr').isel(rlat=slice(190,198), rlon=slice(5,15), time=slice(0,10)) # cutting it just to make it faster
mask=ds.tasmax.isel(time=1, drop=True).notnull()
ds_stack = xs.utils.stack_drop_nans(ds,mask,to_file = 'coords_{domain}_{shape}.nc')

Here is the raw ref data and the stack ref data:

Dimensions:       (rlat: 8, rlon: 10, time: 10)
    lat           (rlat, rlon) float32 dask.array<chunksize=(8, 10), meta=np.ndarray>
    lon           (rlat, rlon) float32 dask.array<chunksize=(8, 10), meta=np.ndarray>
  * rlat          (rlat) float32 -29.07 -28.98 -28.89 ... -28.62 -28.53 -28.44
  * rlon          (rlon) float32 325.1 325.1 325.2 325.3 ... 325.7 325.8 325.9
  * time          (time) datetime64[ns] 1980-01-01 1980-01-02 ... 1980-01-10
Data variables:
    rotated_pole  float32 ...
    tasmax        (time, rlat, rlon) float32 dask.array<chunksize=(10, 8, 10), meta=np.ndarray>

Dimensions:       (loc: 80, time: 10)
    lat           (loc) float32 dask.array<chunksize=(80,), meta=np.ndarray>
    lon           (loc) float32 dask.array<chunksize=(80,), meta=np.ndarray>
  * time          (time) datetime64[ns] 1980-01-01 1980-01-02 ... 1980-01-10
    rlat          (loc) float64 -29.07 -29.07 -29.07 ... -28.44 -28.44 -28.44
    rlon          (loc) float64 325.1 325.1 325.2 325.3 ... 325.7 325.8 325.9
Dimensions without coordinates: loc
Data variables:
    rotated_pole  (loc) float64 dask.array<chunksize=(80,), meta=np.ndarray>
    tasmax        (time, loc) float32 dask.array<chunksize=(10, 80), meta=np.ndarray>

The issue is here.

ds_unstack = xs.utils.unstack_fill_nan(ds_stack, coords='coords_{domain}_{shape}.nc')
Additional context

If I drop lat and lon from ds the unstacking works, but I can't regrid my simulation (in lat and lon) to my ref (in loc with only rlat and rlon).

I am trying to modify stack_fill_nan to make it work, but I haven't figured it out yet. Any help would be welcome !

(I will do a PR if I can figure out a way to fix it in xscen, but I am also looking for a way to just fix it in my workflow.)


juliettelavoie commented 1 year ago

I think I have a solution:

I think I can just do ds_ref_unstack = xs.utils.unstack_fill_nan(ds_ref_stack, coords=('rlat', 'rlon')). I think the point of the coords file is for nans and there is no real nans over the domain in the RDRS data.

On the QC domain, when I extract my region, I have nans but only outside of my region. After the unstack, I lose the data for lat and lon outside of my domain, but that doesn't matter.

# search
cat_ref = xs.search_data_catalogs(variables_and_freqs={'tasmax':'D'},
                               allow_resampling= False,
            allow_conversion= True,
            other_search_criteria={'source': 'RDRS'})

# extract
region_dict= {
          'name': 'QC-RSDS',
          'method': 'bbox',
            'lon_bnds': [ -83, -55 ],
            'lat_bnds': [ 42, 63 ] }}
dc = cat_ref.popitem()[1]
ds_ref = xs.extract_dataset(catalog=dc,

mask=ds_ref.tasmax.isel(time=0, drop=True).notnull()
ds_ref_stack = xs.utils.stack_drop_nans(ds_ref,mask)
ds_ref_unstack = xs.utils.unstack_fill_nan(ds_ref_stack, coords=('rlat', 'rlon'))

diff= ds_ref_unstack.tasmax- ds_ref.tasmax

diff_lat= ds_ref_unstack.lat- ds_ref.lat

diff_lon= ds_ref_unstack.lat- ds_ref.lat

I will test it on the complete workflow when the issue with the disk where I store my files is fixed.

juliettelavoie commented 1 year ago

Still no solution for using a file with stack_drop_nan and a rotated grid. But, the info-crue workflow works using coords=('rlat', 'rlon'), so I will close this issue.

juliettelavoie commented 1 year ago

We are adding NaNs to RDRS now, so this is an issue again.