corteva / rioxarray

geospatial xarray extension powered by rasterio
https://corteva.github.io/rioxarray
Other
504 stars 80 forks source link

`spatial_ref` coordinate not accessible after saving dataset #732

Closed lesimppa closed 5 months ago

lesimppa commented 5 months ago

MRE

import xarray as xr

_BOUNDS = [-10, -10, 10, 10]
_HEIGHT = 100
_WIDTH = 100
_VARIABLES = ["foo", "bar"]

transform = rasterio.transform.from_bounds(*_BOUNDS, _WIDTH, _HEIGHT)
crs = rasterio.crs.CRS.from_epsg(4326)

dset = xr.Dataset(
    data_vars={var: (["y", "x"], np.ones((_HEIGHT, _WIDTH))) for var in _VARIABLES},
)
dset = dset.rio.write_crs(crs)
dset = dset.rio.write_transform(transform)

assert dset.rio.transform() == transform
assert "spatial_ref" in dset.coords

with TemporaryDirectory() as tmpdir:
    path_to_zarr = Path(tmpdir) / "test.zarr"
    path_to_netcdf = Path(tmpdir) / "test.nc"

    dset.to_zarr(path_to_zarr)
    dset.to_netcdf(path_to_netcdf)

    zarr_dset = xr.open_zarr(path_to_zarr)
    netcdf_dset = xr.open_dataset(path_to_netcdf)

    # `spatial_ref` is not in the coords of the datasets after being written
    print(zarr_dset.coords)
    print(netcdf_dset.coords)

    # this fails as `.rio.transform()` looks into the `spatial_ref` coord
    # which doesn't exist anymore
    assert zarr_dset.rio.transform() == transform
    assert netcdf_dset.rio.transform() == transform

Problem description

Hello everyone. Thanks for the awesome work on rioxarray. I've noticed that after saving a dataset accessed with therio accessor, the transform and other georeferencing metadata can't be accessed anymore.

It seems that the scalar coordinate spatial_ref gets converted into a data variable when saved and thus becomes inaccessible by rioxarray. My hunch is that https://github.com/corteva/rioxarray/blob/166c0c76d9e174387ca6b2368f7eb6a3d07ec13a/rioxarray/rioxarray.py#L562 can't pick up the transform as spatial_ref doesn't exist as coordinates anymore.

Expected Output

It would be great if we could save and reopen datasets and the georeferencing would work. Would it make sense to abstract self._obj.coords[self.grid_mapping].attrs into a property that infers if grid_mapping is actually in coordinates or variables? Something like:

@property
def grid_mapping_attrs(self) -> dict[str, Any,]:
    if self.grid_mapping in self._obj.coords:
        return self._obj.coords[self.grid_mapping].attrs
    elif self.grid_mapping in self._obj.data_vars:
        return self._obj.data_vars[self.grid_mapping].attrs
    else:
        raise ...

Environment Information

rioxarray (0.15.0) deps:
  rasterio: 1.3.8
    xarray: 2023.12.0
      GDAL: 3.7.0
      GEOS: 3.11.2
      PROJ: 9.2.1

Other python deps:
     scipy: 1.11.4
    pyproj: 3.6.1

System:
    python: 3.10.13 | packaged by conda-forge | (main, Oct 26 2023, 18:09:17) [Clang 16.0.6 ]
   machine: macOS-13.3.1-arm64-arm-64bit

Installation method

snowman2 commented 5 months ago

See: https://corteva.github.io/rioxarray/stable/getting_started/getting_started.html#xarray

You need to use decode_coords="all".

lesimppa commented 5 months ago

Thank you so much @snowman2