GlacioHack / geoutils

Analysis of georeferenced rasters and vectors
https://geoutils.readthedocs.io/
BSD 3-Clause "New" or "Revised" License
82 stars 19 forks source link

Sampling interval is irregular when loading Raster with downsampling option #447

Open adehecq opened 7 months ago

adehecq commented 7 months ago

Describe the bug When loading a raster with geoutils.Raster(filename, downsample=6), one can expect that the pixels are sampled every 6 pixels, without interpolation. This is not the case.

To Reproduce

The best way to test this, is to create a Raster with data that increases linearly along lines and/or columns. Here I create a raster that has a gradient of 1 along rows and 100 along lines. I save this raster and open it with downsampling option. The gradient of the output should be 1*downsample along rows and 100*downsample along columns, but when the shape is not a multiple of downsample, it does not work. For example:

import geoutils as gu
import numpy as np
import matplotlib.pyplot as plt
import rasterio as rio
from tempfile import NamedTemporaryFile

# Create a fake Raster with 
# row1: 0, 1, 2, 3... 99
# row2: 100, 101, ... 199 etc
data = np.arange(5000).reshape((50, 100))
transform = rio.transform.from_bounds(0, 0, 100, 50, 100, 50)
rst = gu.Raster.from_array(data, crs="EPSG:4326", transform=transform)

# Save to file (needed to open with downsample option
tmpfile = NamedTemporaryFile(suffix=".tif")
rst.save(tmpfile.name)

# Load with a downsampling that is not multiple of 50 or 100
test = gu.Raster(tmpfile.name, downsample=6)

# Plot gradients
fig, axes = plt.subplots(1, 2, figsize=(16, 4))
im1 = axes[0].imshow(np.gradient(test.data.data, axis=0))
plt.colorbar(im1)

im2 = axes[1].imshow(np.gradient(test.data.data, axis=1))
plt.colorbar(im2)
plt.tight_layout()
plt.show()

The output figure is Figure_1

Expected behavior

The gradient of the downsampled image should be constant.

System (please complete the following information): Irrelevant

adehecq commented 7 months ago

The issue is caused by the fact that when downsampling, we are rounding up the shape (here). So when reading the data a few lines below in _load_rio, an out_shape is given. Because the output is forced to that shape, that is slightly larger than the actual size, it runs a nearest neighbor resampling. There are two options: