OCHA-DAP / ds-raster-stats

Pipelines for computing raster statistics from COG datasets
1 stars 1 forks source link

Grid changes when resampling after clip #12

Open hannahker opened 1 month ago

hannahker commented 1 month ago

After a bunch of testing, I've noticed that some pcodes have slightly different raster stats outputs based on whether resampling to a 0.05 degree grid is done before or after clipping to a given bounding box.

So for example:

da_1 = da_seas5.sel(x=slice(minx, maxx), y=slice(maxy, miny)).persist()
da_1 = upsample_raster(da_1)

print(da_1.rio.transform())
print(da_1.rio.resolution())

# Returns
# | 0.05, 0.00, 33.00|
# | 0.00,-0.05, 15.00|
# | 0.00, 0.00, 1.00|
# (0.050000000000000024, -0.050000000000000024)

vs

da_2 = upsample_raster(da_seas5)
da_2 = da_2.sel(x=slice(minx, maxx), y=slice(maxy, miny)).persist()

print(da_2.rio.transform())
print(da_2.rio.resolution())

# Returns
# | 0.05, 0.00, 32.97|
# | 0.00,-0.05, 14.83|
# | 0.00, 0.00, 1.00|
# (0.05002757249118728, -0.05001598322000091)

We can see the impact of this when we take a look at the rasterized admin boundaries from the two separate grids:

https://github.com/user-attachments/assets/5fa31922-64c3-499d-a58b-f0737351db85

hannahker commented 1 month ago

Claude says:

This is a common issue in raster processing that occurs due to how resampling and clipping operations handle pixel alignment and boundaries. Let me explain what's happening and suggest solutions.

import xarray as xr
import rioxarray

def align_and_resample(
    da: xr.DataArray,
    target_resolution: float,
    bounds: tuple,
    align_pixels: bool = True
) -> xr.DataArray:
    """
    Properly aligns and resamples a DataArray while maintaining pixel alignment.

    Parameters
    ----------
    da : xarray.DataArray
        Input DataArray with spatial coordinates
    target_resolution : float
        Desired resolution in units of the input data's CRS
    bounds : tuple
        (minx, miny, maxx, maxy) in the input data's CRS
    align_pixels : bool, default True
        If True, ensures pixels are aligned to the target resolution grid

    Returns
    -------
    xarray.DataArray
        Resampled and aligned DataArray
    """
    # Extract bounds
    minx, miny, maxx, maxy = bounds

    if align_pixels:
        # Align bounds to target resolution grid
        minx = round(minx / target_resolution) * target_resolution
        maxx = round(maxx / target_resolution) * target_resolution
        miny = round(miny / target_resolution) * target_resolution
        maxy = round(maxy / target_resolution) * target_resolution

    # Calculate number of pixels
    nx = int(round((maxx - minx) / target_resolution))
    ny = int(round((maxy - miny) / target_resolution))

    # Create target coordinates
    new_x = np.linspace(minx + target_resolution/2, 
                       maxx - target_resolution/2, 
                       nx)
    new_y = np.linspace(maxy - target_resolution/2,
                       miny + target_resolution/2,
                       ny)

    # Create template DataArray with desired grid
    template = xr.DataArray(
        np.zeros((ny, nx)),
        coords={
            'y': new_y,
            'x': new_x
        },
        dims=['y', 'x']
    )

    # Add CRS if present in original
    if hasattr(da, 'rio'):
        template.rio.write_crs(da.rio.crs, inplace=True)

    # Reproject input data to match template
    result = da.rio.reproject_match(template)

    return result

The difference you're seeing occurs because:

  1. When you clip first (da_1), the resampling operation works with the exact boundary coordinates you specified, creating a regular grid starting from those points.

  2. When you resample first (da_2), the resampling operation creates a new grid based on the original data's extent, and then the clipping operation has to snap to the nearest pixels in this new grid, which can lead to slightly different alignments.

To solve this, I've provided a function that:

  1. Aligns bounds to the target resolution grid
  2. Creates a template grid with exact pixel centers
  3. Uses reproject_match to ensure consistent results

You can use it like this:

# Define your target resolution and bounds
target_resolution = 0.05
bounds = (minx, miny, maxx, maxy)

# Process your data
result = align_and_resample(da_seas5, target_resolution, bounds)

This approach will give you consistent results regardless of the order of operations. The key benefits are:

  1. Pixel centers are properly aligned to the target resolution grid
  2. Boundaries are adjusted to ensure whole pixels
  3. The output resolution will be exactly what you specified
hannahker commented 3 weeks ago

Looks like @zackarno found the solution here! https://github.com/OCHA-DAP/ds-raster-stats/pull/13#issuecomment-2463353673