Open nilsleh opened 10 months ago
@snowman2 can you comment on this issue? We would like to add support for HDF5/NetCDF to TorchGeo and rioxarray seems to be the correct way to simulate a rasterio file but we're stuck on the above example. We're not sure if we're doing something wrong or this is a bug in rioxarray.
Inspecting the file:
It appears that the dimensions i
and j
don't contain the geospatial coordinate information and instead contain the indexes of the grid. This is problematic as those coordinates are used to re-calculate the resolution/transform when the exist even if the transform has been written as it is usually more reliable (this happens quite a bit in the rioxarray code). For best results, I recommend replacing those values with the correct x & y geospatial coordinate arrays.
Side note: rio.write_transform
is helpful to add the transform when the coordinate arrays do not exist (to save disk space).
This is how rioxarray adds coordinates based on the transform: https://github.com/corteva/rioxarray/blob/c15b86061feff8c2c7b0964f19922a3154a85f1a/rioxarray/_io.py#L1203-L1206
Thanks for the suggestion. For the zip file previously attached, the case seems to be that the defined coordinate grid from longitude(i,j)
is not equally spaced for the coordinates. For example, inspecting arr_one.longitude.data[:,0]
. Thus I am not sure how replacing the i, j coordinates with an appropriate geospatial coordinate 1d arrays would go and it seems that that is a separate issue.
So apart from that I included some additional data here where the coordinate issue is not of concern, which might help to get more closely to understanding the actual transform error rasterio.errors.WindowError: Bounds and transform are inconsistent
.
As background information, for the torchgeo dataset implementation we are trying to achieve the following:
.nc
or .hdf5
file (i.e. CMIP data etc.)import xarray as xr
from rioxarray.merge import merge_arrays
query = {
"minx": 0,
"miny": 0,
"maxx": 50,
"maxy": 50
}
_crs = "EPSG:4326"
spatial_x_name = "longitude"
spatial_y_name = "latitude"
data_variables = ["sst", "sla"]
alt_ds= xr.open_dataset("/eo/dt_global_twosat_phy_l4_199311_vDT2021-M01.nc")
alt_ds = alt_ds.assign_coords(longitude=(alt_ds.longitude % 360 - 180)).roll(
longitude=(alt_ds.dims["longitude"] // 2)
)
sst_ds = xr.open_dataset("/eo/HadISST1_SST_update.nc")
sst_ds = sst_ds.sortby("latitude", ascending=True)
data_arrays = []
for ds in [sst_ds, alt_ds]:
ds.rio.set_spatial_dims(spatial_x_name, spatial_y_name, inplace=True)
for variable in data_variables:
if hasattr(ds, variable):
da = ds[variable]
if not da.rio.crs:
da.rio.write_crs(_crs, inplace=True)
elif da.rio.crs != _crs:
da = da.rio.reproject(_crs)
# clip box ignores time dimension
clipped = da.rio.clip_box(
minx=query["minx"], miny=query["miny"], maxx=query["maxx"], maxy=query["maxy"]
)
# rioxarray expects this order
clipped = clipped.transpose("time", spatial_y_name, spatial_x_name, ...)
data_arrays.append(clipped.squeeze())
merged_data = merge_arrays(
data_arrays, bounds=(query["minx"], query["miny"], query["maxx"], query["maxy"])
).data
Running that snippet I get:
ile "/home/nils/miniconda3/envs/torchEnv/lib/python3.9/site-packages/rioxarray/merge.py", line 175, in merge_arrays
merged_data, merged_transform = _rio_merge(
File "/home/nils/miniconda3/envs/torchEnv/lib/python3.9/site-packages/rasterio/merge.py", line 341, in merge
src_window = windows.from_bounds(int_w, int_s, int_e, int_n, src.transform)
File "/home/nils/miniconda3/envs/torchEnv/lib/python3.9/site-packages/rasterio/windows.py", line 323, in from_bounds
raise WindowError("Bounds and transform are inconsistent")
rasterio.errors.WindowError: Bounds and transform are inconsistent
So I think there is a mismatch between how the bounds and transforms are defined vs how they are expected in rioxarray and I am not sure where exactly this difference is.
@snowman2 thanks for the hint! We'll see if that addresses the coordinate issue in the first paragraph of https://github.com/corteva/rioxarray/issues/698#issuecomment-1808219206. Do you have any thoughts on the other paragraphs of that comment?
Problem description
I am working with CMIP6 data and looking at utilizing
rioxarray
for possibly combining different data streams, so I am looking for a general approach through rioxarray. For this am using themerge_arrays()
function, with the hope of applying it to not just for these specific CMIP6 files. However, for these files I am encounteringrasterio.errors.WindowError: Bounds and transform are inconsistent
Errors which seem to be due to a wrong transformation being read from the src files. Theoutput_transform
is computed correctly and works, and therefore I was thinking about setting the transform beforehand to the array. However,write_transform()
does not change the transformation in this case, so I am wondering what I am missing or if there is another way to go about this. Data for the following example can be found here.Code Sample
Environment Information
rioxarray - 0.15.0 rasterio - 1.3.6 xarray - 2023.6.0