corteva / rioxarray

geospatial xarray extension powered by rasterio
https://corteva.github.io/rioxarray
Other
528 stars 84 forks source link

Treatment of `AREA_OR_POINT` #805

Closed DahnJ closed 1 month ago

DahnJ commented 1 month ago

Code Sample & Problem description

I have two sample file that only differ on their AREA_OR_POINT specification (sample_tifs.zip)

❯ diff <(gdalinfo pixel-is-area-sub.tif) <(gdalinfo pixel-is-point-sub.tif)
2c2
< Files: pixel-is-area-sub.tif
---
> Files: pixel-is-point-sub.tif
36c36
<   AREA_OR_POINT=Area
---
>   AREA_OR_POINT=Point

when I read them, I would expect the coordinates of one of those to be shifted, as mentioned in the docstring for open_rasterio (link):

The x and y coordinates are generated automatically from the file's geoinformation, shifted to the center of each pixel

However, I don't observe this behaviour

>>> import rioxarray
>>> pt = rioxarray.open_rasterio('pixel-is-point-sub.tif')
>>> ar = rioxarray.open_rasterio('pixel-is-area-sub.tif')
>>> all(pt['x'] == ar['x'])
True
>>> all(pt['y'] == ar['y'])
True
>>> pt.rio.transform() == ar.rio.transform()
True
>>> pt['spatial_ref'].attrs == ar['spatial_ref'].attrs
True

How were the files created: I have taken a sample of SRTM data and have altered the metadata property using gdal_edit

❯ cp pixel-is-point-sub.tif pixel-is-area-sub.tif
❯ gdal_edit -mo "AREA_OR_POINT=Area" pixel-is-area-sub.tif

Am I doing something wrong here, or have I misunderstood the AREA_OR_POINT property?

Expected Output

I expected the coordinates to be shifted for one of the options and thus differ between them.

Perhaps that expectation is wrong and the coordinates are simply always shited to the same location with rioxarray. But then it would suggest to me that for PixelIsPoint (Raster Space docs), the coordinates, as read by rioxarray, are actually half a pixel away from where the point was sampled, which would be surprising to me.

I may have also potentially created the files in the wrong way.

Environment Information

python -c "import rioxarray; rioxarray.show_versions()"
rioxarray (0.17.0) deps:
  rasterio: 1.3.11
    xarray: 2024.9.0
      GDAL: 3.9.2
      GEOS: 3.13.0
      PROJ: 9.5.0
 PROJ DATA: /Users/mysusr/miniconda3/envs/myenv/share/proj
 GDAL DATA: /Users/myusr/miniconda3/envs/myenv/share/gdal

Other python deps:
     scipy: 1.14.1
    pyproj: 3.6.1

System:
    python: 3.12.6 | packaged by conda-forge | (main, Sep 30 2024, 17:55:20) [Clang 17.0.6 ]
executable: /Users/myusr/miniconda3/envs/myenv/bin/python
   machine: macOS-14.5-arm64-arm-64bit

Installation method

Conda

Conda environment information (if you installed with conda):


Environment (conda list):

``` $ conda list | grep -E "rasterio|xarray|gdal" gdal 3.9.2 py312h936c49d_4 conda-forge libgdal 3.9.2 hce30654_4 conda-forge libgdal-core 3.9.2 h3535123_5 conda-forge libgdal-fits 3.9.2 h248c7bc_4 conda-forge libgdal-grib 3.9.2 h6d3d72d_4 conda-forge libgdal-hdf4 3.9.2 h3847bb8_4 conda-forge libgdal-hdf5 3.9.2 h2def128_4 conda-forge libgdal-jp2openjpeg 3.9.2 hd61e619_4 conda-forge libgdal-kea 3.9.2 h7b2de0b_4 conda-forge libgdal-netcdf 3.9.2 h5e0d008_4 conda-forge libgdal-pdf 3.9.2 h587d690_4 conda-forge libgdal-pg 3.9.2 h147afc8_4 conda-forge libgdal-postgisraster 3.9.2 h147afc8_4 conda-forge libgdal-tiledb 3.9.2 h27a95ea_4 conda-forge libgdal-xls 3.9.2 habc1c91_4 conda-forge rasterio 1.3.11 py312h6ce65b9_2 conda-forge rioxarray 0.17.0 pyhd8ed1ab_0 conda-forge xarray 2024.9.0 pyhd8ed1ab_0 conda-forge xarray-spatial 0.4.0 pyhd8ed1ab_0 conda-forge ```

keywords for easy search in the future: Raster Space, GTRasterTypeGeoKey, PIXEL_OR_AREA, PixelIsArea, PixelIsPoint, pixelCenter

snowman2 commented 1 month ago

https://gdal.org/en/latest/user/raster_data_model.html

AREA_OR_POINT: May be either "Area" (the default) or "Point". Indicates whether a pixel value should be assumed to represent a sampling over the region of the pixel or a point sample at the center of the pixel. This is not intended to influence interpretation of georeferencing which remains area oriented.

DahnJ commented 1 month ago

Thanks for the quick answer @snowman2 !

I interpret your answer as saying: coordinates should stay the same in the in-memory xarray dataset both cases. The only difference is how they are then intepreted. It's up to the user to make the right choice. Am I interpreting it correctly?

To me, this makes the docstring for open_rasterio quite confusing, specifically:

The x and y coordinates are generated automatically from the file's geoinformation, shifted to the center of each pixel

I read that as: in-memory representation's coordinates always represent the center of the pixel. That seems to me in direct contradiction with the above, where in the PixelIsArea case, the coordinates (without any shifting) would refer to the top-left corner of the pixel.


More relevant links for future visitors:

Kirill888 commented 1 month ago

@DahnJ

The x and y coordinates are generated automatically from the file's geoinformation, shifted to the center of each pixel

That refers to coordinate values stored in the x,y coordinate arrays. Here "pixel coordinate" refers to the physical location in the world of the center of the pixel. Image coordinate of 0.0,0.0 always refers to the top left corner of the top left pixel, and Affine matrix maps from pixel image coordinates to world coordinates. Point a.x[i], a.y[j] is always world location of the pixel center for pixel a[j, i], regardless of the underlying format of the input file.

But I agree "shifted" is probably not the best word to use to describe it.

DahnJ commented 1 month ago

Thank you @Kirill888 for the answer.

We've seen that, for both Point and Area

You also mention

Then what does change? I'm failing to see any difference between files with Area and Point.

Kirill888 commented 1 month ago

The difference is in what transform is stored in the geotiff tags inside the file. By the time rasterio library sees the transform it has been normalised by gdal to be equivalent to “area mode”. It’s really just a quirk of geotiff spec.

DahnJ commented 1 month ago

I see. And it's something that even gdalinfo doesn't surface, since it already transforms the geotransform during reading the file to produce the info?

If that's the case, then I think that the line

The x and y coordinates are generated automatically from the file's geoinformation, shifted to the center of each pixel (see"PixelIsArea" Raster Space <http://web.archive.org/web/20160326194152/http://remotesensing.org/geotiff/spec/geotiff2.5.html#2.5.2>_ for more information). (source)

is likely to lead people astray rather then help. It has done that for me and a handful of other people I've asked and the area/point distinction seems to even have managed to confuse the author of the GDAL code. If this transformation is truly an implementation detail that GDAL takes care of then I'd say rioxarray does not need to refer to it.

I can see value in saying something like

The x and y coordinates are generated from the file's geoinformation and refer to the center of the pixel.

This clarifies the meaning of the coordinates without sending them into a rabbit hole that ultimately doesn't help them.

What do you think @snowman2? Happy to open a PR.

snowman2 commented 1 month ago

Yes, a PR with clarification is welcome.

DahnJ commented 1 month ago

@snowman2 great, thanks. Made the change here https://github.com/corteva/rioxarray/pull/811