Open hannahker opened 1 month ago
Claude says:
This is a common issue in raster processing that occurs due to how resampling and clipping operations handle pixel alignment and boundaries. Let me explain what's happening and suggest solutions.
import xarray as xr
import rioxarray
def align_and_resample(
da: xr.DataArray,
target_resolution: float,
bounds: tuple,
align_pixels: bool = True
) -> xr.DataArray:
"""
Properly aligns and resamples a DataArray while maintaining pixel alignment.
Parameters
----------
da : xarray.DataArray
Input DataArray with spatial coordinates
target_resolution : float
Desired resolution in units of the input data's CRS
bounds : tuple
(minx, miny, maxx, maxy) in the input data's CRS
align_pixels : bool, default True
If True, ensures pixels are aligned to the target resolution grid
Returns
-------
xarray.DataArray
Resampled and aligned DataArray
"""
# Extract bounds
minx, miny, maxx, maxy = bounds
if align_pixels:
# Align bounds to target resolution grid
minx = round(minx / target_resolution) * target_resolution
maxx = round(maxx / target_resolution) * target_resolution
miny = round(miny / target_resolution) * target_resolution
maxy = round(maxy / target_resolution) * target_resolution
# Calculate number of pixels
nx = int(round((maxx - minx) / target_resolution))
ny = int(round((maxy - miny) / target_resolution))
# Create target coordinates
new_x = np.linspace(minx + target_resolution/2,
maxx - target_resolution/2,
nx)
new_y = np.linspace(maxy - target_resolution/2,
miny + target_resolution/2,
ny)
# Create template DataArray with desired grid
template = xr.DataArray(
np.zeros((ny, nx)),
coords={
'y': new_y,
'x': new_x
},
dims=['y', 'x']
)
# Add CRS if present in original
if hasattr(da, 'rio'):
template.rio.write_crs(da.rio.crs, inplace=True)
# Reproject input data to match template
result = da.rio.reproject_match(template)
return result
The difference you're seeing occurs because:
When you clip first (da_1), the resampling operation works with the exact boundary coordinates you specified, creating a regular grid starting from those points.
When you resample first (da_2), the resampling operation creates a new grid based on the original data's extent, and then the clipping operation has to snap to the nearest pixels in this new grid, which can lead to slightly different alignments.
To solve this, I've provided a function that:
reproject_match
to ensure consistent resultsYou can use it like this:
# Define your target resolution and bounds
target_resolution = 0.05
bounds = (minx, miny, maxx, maxy)
# Process your data
result = align_and_resample(da_seas5, target_resolution, bounds)
This approach will give you consistent results regardless of the order of operations. The key benefits are:
Looks like @zackarno found the solution here! https://github.com/OCHA-DAP/ds-raster-stats/pull/13#issuecomment-2463353673
After a bunch of testing, I've noticed that some pcodes have slightly different raster stats outputs based on whether resampling to a 0.05 degree grid is done before or after clipping to a given bounding box.
So for example:
vs
We can see the impact of this when we take a look at the rasterized admin boundaries from the two separate grids:
https://github.com/user-attachments/assets/5fa31922-64c3-499d-a58b-f0737351db85