corteva / rioxarray

geospatial xarray extension powered by rasterio
https://corteva.github.io/rioxarray
Other
517 stars 82 forks source link

Inconsistent behaviour of clip_box #617

Closed gcaria closed 1 year ago

gcaria commented 1 year ago
import geopandas as gpd
import rioxarray as rxr

root = 'https://esa-worldcover.s3.eu-central-1.amazonaws.com/v100/2020/map/ESA_WorldCover_10m_2020_v100_{fname}_Map.tif'
da1 = rxr.open_rasterio(root.format(fname='N45E003'))
da2 = rxr.open_rasterio(root.format(fname='N45E000'))

xr.testing.assert_equal(da1.y, da2.y) # all good

gdf = gpd.read_file("https://d9-wret.s3.us-west-2.amazonaws.com/assets/palladium/production/s3fs-public/atoms/files/WRS2_descending_0.zip")
gdf_scene = gdf[((gdf.PATH==198) & (gdf.ROW==28))]
bounds = gdf_scene.geometry.values[0].bounds

da1_clipped = da1.rio.clip_box(*bounds)
da2_clipped = da2.rio.clip_box(*bounds)
xr.testing.assert_equal(da1_clipped.y, da2_clipped.y) # AssertionError: Left and right DataArray objects are not equal

Problem description

I am clipping two rasters which are side by side along the x-axis (so they share the same y-axis coordinates) with the same box polygon, so expect the outputs to also share the y-axis coordinates, but they don't.

Expected Output

I expect the outputs to also share the y-axis coordinates.

Environment Information

rioxarray (0.13.2) deps:
  rasterio: 1.3.4
    xarray: 2022.12.0
      GDAL: 3.5.3
      GEOS: 3.11.1
      PROJ: 9.0.1
 PROJ DATA: /home/ubuntu/envs/exp/lib/python3.9/site-packages/rasterio/proj_data
 GDAL DATA: /home/ubuntu/envs/exp/lib/python3.9/site-packages/rasterio/gdal_data
Other python deps:
     scipy: 1.9.3
    pyproj: 3.4.1

System:
    python: 3.9.13 (main, May 27 2022, 10:19:08)  [GCC 11.2.0]
executable: /home/ubuntu/envs/exp/bin/python
   machine: Linux-5.15.0-1026-aws-x86_64-with-glibc2.35

Installation method

pypi

snowman2 commented 1 year ago

The the x-coordinates the same? If not, that is likely the reason.

gcaria commented 1 year ago

If also the x-coordinate were the same, the two input rasters would overlap completely (as the MRE shows, that's not the case).

A simple diagram that I hope complements the MRE above: Screen Shot 2022-12-13 at 6 51 41 pm

snowman2 commented 1 year ago

image

snowman2 commented 1 year ago
len(da1_clipped.y),len(da2_clipped.y)
(22018, 22017)

This is true:

xr.testing.assert_equal(da1_clipped.y[1:], da2_clipped.y)
snowman2 commented 1 year ago

When clipping rasters, you cannot be guaranteed that everything will mach up exactly in your example here. Floating point issues and other things can have an impact on the behavior of the clip. If you want to join them together, I recommend using merge:

from rioxarray.merge import merge_arrays

da_merged = merge_arrays([da1_clipped, da2_clipped])
gcaria commented 1 year ago

I'm very surprised. I was using clip_box as a shorter version of:

da1_clipped = da1.sel(x=slice(bounds[0], bounds[2]), y=slice(bounds[3], bounds[1]))
da2_clipped = da2.sel(x=slice(bounds[0], bounds[2]), y=slice(bounds[3], bounds[1]))
xr.testing.assert_equal(da1_clipped.y, da2_clipped.y) # no exception raised

Out of curiosity, why are the results of clip_box different?

snowman2 commented 1 year ago

The implementation is different. clip_box uses rasterio to determine the window to select: https://github.com/corteva/rioxarray/blob/ca31364542380ca06d38c06cc556af466e135913/rioxarray/raster_array.py#L808

numeric precision differences when calculating the transform can cause there to be a slight offset.

snowman2 commented 1 year ago

One part I forgot to mention is that the example you showed only considers the centroid of the grid cell whereas the rasterio version considers the entire grid cell.

gcaria commented 1 year ago

I can see how that's one difference, but wouldn't that impact the "left" and "right" rasters (da1 and da2) in the same way? My main issue with this result from clip_box is its lack of symmetry.

snowman2 commented 1 year ago

wouldn't that impact the "left" and "right" rasters (da1 and da2) in the same way?

Not necessarily. Depending on how close the boundary is to the edge of the grid cells, floating point precision differences could make it include grid cells on one side versus the other. Also, the boundary may be slightly shifted compared to the raster. If the boundary or raster is shifted, there can be differences in the resulting raster.

snowman2 commented 1 year ago

If you want to be sure everything matches up exactly on both sides, one option is to merge the rasters before clipping.

snowman2 commented 1 year ago

I dug into the code and here are the rasterio Windows used for clipping:

da1:

Window(col_off=-20099.16, row_off=12617.999999999884, width=34670.16, height=22016.40000000014)
# this is converted to:
# row_start=12617.999999999884 row_stop=34634.40000000002 col_start=-20099.16 col_stop=14571.000000000004

da2:

Window(col_off=15900.84, row_off=12618.0, width=34670.16, height=22016.40000000014)
# this is converted to:
# row_start=12618.0 row_stop=34634.40000000014 col_start=15900.84 col_stop=50571.0

Note that da1 has row_start=12617.999999999884. In rio.isel_window it uses math.floor to select the index of the row to ensure any intersecting cells are added. For da1 this converts row_start=12617 and for da2 this is row_start=12618.

And that is why there is an additional row for one and not the other. It all comes down to floating point precision issues.

gcaria commented 1 year ago

Thanks for looking into this, any reason why math.floor has been chosen instead of round?

snowman2 commented 1 year ago

" it uses math.floor to select the index of the row to ensure any intersecting cells are added."