corteva / rioxarray

geospatial xarray extension powered by rasterio
https://corteva.github.io/rioxarray
Other
504 stars 80 forks source link

rioxarray.write_transform() not changing transform #698

Open nilsleh opened 10 months ago

nilsleh commented 10 months ago

Problem description

I am working with CMIP6 data and looking at utilizing rioxarray for possibly combining different data streams, so I am looking for a general approach through rioxarray. For this am using the merge_arrays() function, with the hope of applying it to not just for these specific CMIP6 files. However, for these files I am encountering rasterio.errors.WindowError: Bounds and transform are inconsistent Errors which seem to be due to a wrong transformation being read from the src files. The output_transform is computed correctly and works, and therefore I was thinking about setting the transform beforehand to the array. However, write_transform() does not change the transformation in this case, so I am wondering what I am missing or if there is another way to go about this. Data for the following example can be found here.

Code Sample

import xarray as xr
from rasterio.transform import Affine
from rioxarray.merge import merge_arrays

output_transform = Affine(1.0, 0.0, -0.5, 0.0, -1.0, 329.5)

arr_one = xr.open_dataset("CanESM5_r1i1p1f1_tos.nc")["tos"].squeeze()
arr_one.rio.set_spatial_dims("i", "j", inplace=True)

print(f"Before: {arr_one.rio.transform()}")
arr_one.rio.write_transform(output_transform, inplace=True)
print(f"After: {arr_one.rio.transform()}")

arr_two = xr.open_dataset("CanESM5_r1i1p1f1_zos.nc")["zos"].squeeze()
arr_two.rio.set_spatial_dims("i", "j", inplace=True)
arr_two.rio.write_transform(output_transform, inplace=True)

output = merge_arrays([arr_one, arr_two])

Environment Information

rioxarray - 0.15.0 rasterio - 1.3.6 xarray - 2023.6.0

adamjstewart commented 8 months ago

@snowman2 can you comment on this issue? We would like to add support for HDF5/NetCDF to TorchGeo and rioxarray seems to be the correct way to simulate a rasterio file but we're stuck on the above example. We're not sure if we're doing something wrong or this is a bug in rioxarray.

snowman2 commented 8 months ago

Inspecting the file: image

It appears that the dimensions i and j don't contain the geospatial coordinate information and instead contain the indexes of the grid. This is problematic as those coordinates are used to re-calculate the resolution/transform when the exist even if the transform has been written as it is usually more reliable (this happens quite a bit in the rioxarray code). For best results, I recommend replacing those values with the correct x & y geospatial coordinate arrays.

snowman2 commented 8 months ago

Side note: rio.write_transform is helpful to add the transform when the coordinate arrays do not exist (to save disk space).

snowman2 commented 8 months ago

image

snowman2 commented 8 months ago

This is how rioxarray adds coordinates based on the transform: https://github.com/corteva/rioxarray/blob/c15b86061feff8c2c7b0964f19922a3154a85f1a/rioxarray/_io.py#L1203-L1206

nilsleh commented 8 months ago

Thanks for the suggestion. For the zip file previously attached, the case seems to be that the defined coordinate grid from longitude(i,j) is not equally spaced for the coordinates. For example, inspecting arr_one.longitude.data[:,0]. Thus I am not sure how replacing the i, j coordinates with an appropriate geospatial coordinate 1d arrays would go and it seems that that is a separate issue.

So apart from that I included some additional data here where the coordinate issue is not of concern, which might help to get more closely to understanding the actual transform error rasterio.errors.WindowError: Bounds and transform are inconsistent.

As background information, for the torchgeo dataset implementation we are trying to achieve the following:

import xarray as xr
from rioxarray.merge import merge_arrays

query = {
    "minx": 0,
    "miny": 0,
    "maxx": 50,
    "maxy": 50
}

_crs = "EPSG:4326"
spatial_x_name = "longitude"
spatial_y_name = "latitude"

data_variables = ["sst", "sla"]

alt_ds= xr.open_dataset("/eo/dt_global_twosat_phy_l4_199311_vDT2021-M01.nc")
alt_ds = alt_ds.assign_coords(longitude=(alt_ds.longitude % 360 - 180)).roll(
        longitude=(alt_ds.dims["longitude"] // 2)
    )
sst_ds = xr.open_dataset("/eo/HadISST1_SST_update.nc")
sst_ds = sst_ds.sortby("latitude", ascending=True)

data_arrays = []
for ds in [sst_ds, alt_ds]:
    ds.rio.set_spatial_dims(spatial_x_name, spatial_y_name, inplace=True)

    for variable in data_variables:
        if hasattr(ds, variable):
            da = ds[variable]
            if not da.rio.crs:
                da.rio.write_crs(_crs, inplace=True)
            elif da.rio.crs != _crs:
                da = da.rio.reproject(_crs)
            # clip box ignores time dimension
            clipped = da.rio.clip_box(
                minx=query["minx"], miny=query["miny"], maxx=query["maxx"], maxy=query["maxy"]
            )
            # rioxarray expects this order
            clipped = clipped.transpose("time", spatial_y_name, spatial_x_name, ...)

            data_arrays.append(clipped.squeeze())

merged_data = merge_arrays(
    data_arrays, bounds=(query["minx"], query["miny"], query["maxx"], query["maxy"])
).data

Running that snippet I get:

ile "/home/nils/miniconda3/envs/torchEnv/lib/python3.9/site-packages/rioxarray/merge.py", line 175, in merge_arrays
    merged_data, merged_transform = _rio_merge(
  File "/home/nils/miniconda3/envs/torchEnv/lib/python3.9/site-packages/rasterio/merge.py", line 341, in merge
    src_window = windows.from_bounds(int_w, int_s, int_e, int_n, src.transform)
  File "/home/nils/miniconda3/envs/torchEnv/lib/python3.9/site-packages/rasterio/windows.py", line 323, in from_bounds
    raise WindowError("Bounds and transform are inconsistent")
rasterio.errors.WindowError: Bounds and transform are inconsistent

So I think there is a mismatch between how the bounds and transforms are defined vs how they are expected in rioxarray and I am not sure where exactly this difference is.

snowman2 commented 8 months ago

209 may be helpful.

adamjstewart commented 7 months ago

@snowman2 thanks for the hint! We'll see if that addresses the coordinate issue in the first paragraph of https://github.com/corteva/rioxarray/issues/698#issuecomment-1808219206. Do you have any thoughts on the other paragraphs of that comment?