Add guidance on when to apply mask and scale/offset before resampling

maxrjones commented 4 weeks ago

Despite potential modest performance hits, should the mask and scale always be applied before resampling to prevent accidental mistakes?

If not, I think this approach could work:

For most cases (bucket, non-linear interpolation, conservative), apply both mask and offset/scale before resampling
For linear interpolation, apply mask before resampling and offset/scaling before or after
If using resampling that preserves original values (nearest neighbor?), it doesn't matter but it may be faster to apply the mask and offset/scale after resampling to lower resolutions

maxrjones commented 4 weeks ago

@vincentsarago @sharkinsspatial would either of you be able to provide guidance about the correct approach for unscaling during tile generation?

vincentsarago commented 4 weeks ago

In GDAL, applying the scale and offset is done after reading the data

Note that applying scale and offset is of the responsibility of the user, and is not done by methods such as RasterIO() or ReadBlock().

About the mask we let GDAL handle everything https://gdal.org/en/latest/programs/gdalwarp.html#nodata-source-validity-mask-handling but the mask (either nodata / alpha / internal mask) has to be set before wrapping.

I see that with rioxarray you do the opposite (when using mask_and_scale=True), you'll read the dataset, upscale and then wrap (I think)

I'm not quite sure about the optimization but when unscaling the data you'll endup with float data which might take more memory 🤷‍♂️

developmentseed / warp-resample-profiling