GlacioHack / geoutils

Analysis of georeferenced rasters and vectors
https://geoutils.readthedocs.io/
BSD 3-Clause "New" or "Revised" License
82 stars 19 forks source link

stack_rasters performs unwanted interpolation for integer types #599

Closed axeldolce closed 2 weeks ago

axeldolce commented 1 month ago

Describe the bug The stack_rasters function performs an interpolation of integer types even when the Resampling.nearest method is specified.

To Reproduce To reproduce the behaviour one needs two integer Rasters with different bounds. Here is an example:

import numpy as np
from affine import Affine
from geoutils import Raster
from geoutils.raster import stack_rasters
from pyproj import CRS
from rasterio.enums import Resampling

shape = (10, 10)
data_int = np.ones(shape).astype(np.uint16)
data_mask = np.zeros(shape).astype(bool)
data_masked = np.ma.masked_array(data=data_int, mask=data_mask)  # type: ignore[var-annotated]

r1 = Raster.from_array(
    data=data_masked,
    transform=Affine(
        1000.0,
        0.0,
        1_000_000.0,
        0.0,
        -1000.0,
        1_000_000.0,
    ),
    crs=CRS.from_string("EPSG:3857"),
    nodata=0,
)

shape = (10, 10)
data_int = (np.ones(shape) * 5).astype(np.uint16)
data_int[:5, :5] = 1
data_mask = np.zeros(shape).astype(bool)
data_masked = np.ma.masked_array(data=data_int, mask=data_mask)  # type: ignore[var-annotated]

r2 = Raster.from_array(
    data=data_masked,
    transform=Affine(
        1500.0,
        0.0,
        1_000_000.0,
        0.0,
        -1500.0,
        1_000_000.0,
    ),
    crs=CRS.from_string("EPSG:3857"),
    nodata=0,
)

rstacked = stack_rasters(
    rasters=[r1, r2],
    progress=False,
    resampling_method=Resampling.nearest,
)

print(np.unique(r1))
print(np.unique(r2))
print(np.unique(rstacked))

This will print out

[1]
[1 5]
[1 3 4 5 --]

where it is clear that some integer values have been interpolated as the values 3 and 4.

Expected behavior The excpected behaviour should be to apply the Resampling.nearest interpolation as requested. In fact, by looking at the code in the stack_rasters function, it seems that the resampling_method is ignored when performing the reprojection:

# Reproject to reference grid
reprojected_raster = raster.reproject(
    bounds=dst_bounds,
    res=reference_raster.res,
    crs=reference_raster.crs,
    dtype=reference_raster.data.dtype,
    nodata=reference_raster.nodata,
    silent=True,
    resampling=resampling_method,  # THE RESAMPLING METHOD IS MISSING
)

System:

Additional context No additional context to specify. I would just like to thank the maintainers of this repo. It is really useful when working on geospatial data and I hope it will get even more recognition.

rhugonnet commented 1 month ago

Thanks @axeldolce! Indeed, it looks that it was as simple as passing the resampling method to the function. I added new tests in test_multiraster inspired by your example above to ensure this is now working fully with that change :slightly_smiling_face: in #601.

After review from other maintainers, I'll publish a new release with this bug fix.

axeldolce commented 1 month ago

Great, thanks!