corteva / rioxarray

geospatial xarray extension powered by rasterio
https://corteva.github.io/rioxarray
Other
518 stars 82 forks source link

`rio.reproject` changes dimension lengths and variables #535

Closed DahnJ closed 2 years ago

DahnJ commented 2 years ago

Code Sample

import pandas as pd
import rioxarray
from io import StringIO

raw = """
-627292.22 -1089977.64 539.86
-627292.22 -1089631.10 539.58
-627292.22 -1088046.64 507.76
-627292.06 -1089977.64 539.91
-627292.06 -1089631.10 543.99
-627292.06 -1088046.64 530.90
"""

# Load data to xarray through Pandas
data = pd.read_csv(StringIO(raw), sep=' ', names=['x', 'y', 'z'], index_col=['y', 'x'])
xdata = data.to_xarray()

# Set CRS
xdata = xdata.rio.write_crs(5514)
> xdata
<xarray.Dataset>
Dimensions:      (y: 3, x: 2)
Coordinates:
  * y            (y) float64 -1.09e+06 -1.09e+06 -1.088e+06
  * x            (x) float64 -6.273e+05 -6.273e+05
    spatial_ref  int64 0
Data variables:
    z            (y, x) float64 539.9 539.9 539.6 544.0 507.8 530.9
> # Reproject
> xdata.rio.reproject(4326)
<xarray.Dataset>
Dimensions:      (x: 1, y: 4)
Coordinates:
  * x            (x) float64 16.1
  * y            (y) float64 49.82 49.81 49.8 49.79
    spatial_ref  int64 0
Data variables:
    z            (y, x) float64 1.798e+308 1.798e+308 1.798e+308 1.798e+308

Problem description

Using xdata.rio.reproject() does not seem to correctly reproject the data

Expected Output

I expected a xarray.Dataset with re-projected coordinates and same values as the input dataset.

Environment Information

rioxarray (0.11.1) deps:
  rasterio: 1.2.10
    xarray: 2022.3.0
      GDAL: 3.4.1
      GEOS: None
      PROJ: None
 PROJ DATA: None
 GDAL DATA: None

Other python deps:
     scipy: 1.8.1
    pyproj: 3.3.0

System:
    python: 3.10.0 | packaged by conda-forge | (default, Nov 20 2021, 02:25:18) [GCC 9.4.0]
executable: /home/dahn/miniconda3/bin/python
   machine: Linux-5.4.0-113-generic-x86_64-with-glibc2.27

Installation method

conda

Conda environment information (if you installed with conda):


Environment (conda list):

``` gdal 3.4.1 py310h222ae0d_3 conda-forge libgdal 3.4.1 hc3d8651_3 conda-forge rasterio 1.2.10 py310h5e0f756_4 conda-forge rioxarray 0.11.1 pyhd8ed1ab_0 conda-forge xarray 2022.3.0 pyhd8ed1ab_0 conda-forge xarray-spatial 0.3.5 pyhd8ed1ab_0 conda-forge ```


Details about conda and system ( conda info ):

``` active environment : base active env location : /home/dahn/miniconda3 shell level : 1 user config file : /home/dahn/.condarc populated config files : /home/dahn/.condarc conda version : 4.13.0 conda-build version : 3.21.9 python version : 3.10.0.final.0 virtual packages : __linux=5.4.0=0 __glibc=2.27=0 __unix=0=0 __archspec=1=x86_64 base environment : /home/dahn/miniconda3 (writable) conda av data dir : /home/dahn/miniconda3/etc/conda conda av metadata url : None channel URLs : https://conda.anaconda.org/conda-forge/linux-64 https://conda.anaconda.org/conda-forge/noarch https://repo.anaconda.com/pkgs/main/linux-64 https://repo.anaconda.com/pkgs/main/noarch https://repo.anaconda.com/pkgs/r/linux-64 https://repo.anaconda.com/pkgs/r/noarch package cache : /home/dahn/miniconda3/pkgs /home/dahn/.conda/pkgs envs directories : /home/dahn/miniconda3/envs /home/dahn/.conda/envs platform : linux-64 user-agent : conda/4.13.0 requests/2.27.1 CPython/3.10.0 Linux/5.4.0-113-generic ubuntu/18.04.6 glibc/2.27 UID:GID : 1001:1001 netrc file : None offline mode : False ```
snowman2 commented 2 years ago

I recommend looking here if you are rasterizing points: https://corteva.github.io/geocube/stable/examples/rasterize_point_data.html

DahnJ commented 2 years ago

I recommend looking here if you are rasterizing points: https://corteva.github.io/geocube/stable/examples/rasterize_point_data.html

Thanks for the pointer. This makes sense — a regular grid in one projection won't stay a regular grid in another, and thus it makes more sense to resample/interpolate.

For completeness, here's an example of how geocube could be used here, using the variables from the original example

import geopandas as gpd

from geocube.api.core import make_geocube
from geocube.rasterize import rasterize_points_griddata

gdf = gpd.GeoDataFrame(
    data['z'],
    geometry=gpd.points_from_xy(
        x=data.index.get_level_values('x'),
        y=data.index.get_level_values('y'),
        crs='epsg:5514'
))

output = make_geocube(
    vector_data=gdf,
    measurements=['z'],
    output_crs="epsg:4326",
    resolution=(-0.01, 0.003),
    rasterize_function=rasterize_points_griddata,
)
> output
<xarray.Dataset>
Dimensions:      (y: 3, x: 2)
Coordinates:
  * y            (y) float64 49.81 49.8 49.79
  * x            (x) float64 16.1 16.1
    spatial_ref  int64 0
Data variables:
    z            (y, x) float64 507.8 530.9 539.6 544.0 539.9 539.9

However, I'm still curious about the behaviour of reproject. In what sense is the output "correct", or how am I using it wrong?

snowman2 commented 2 years ago

The results seem to match GDAL:

xdata.rio.to_raster("test.tif")
gdalwarp test.tif test_4326.tif -t_srs "EPSG:4326"
rds = rioxarray.open_rasterio("test_4326.tif")

The main difference is the nodata value is 0 with GDAL:

<xarray.DataArray (band: 1, y: 4, x: 1)>
array([[[0.],
        [0.],
        [0.],
        [0.]]])
Coordinates:
  * band         (band) int64 1
  * x            (x) float64 16.1
  * y            (y) float64 49.82 49.81 49.8 49.79
    spatial_ref  int64 0
Attributes:
    scale_factor:  1.0
    add_offset:    0.0
    long_name:    

I think it the small grid size in the X dimension is likely causing the troubles:

>>> xdata.rio.resolution()
(0.15999999991618097, 965.5)

You may get better information on the mailing list or issue tracker here: https://github.com/OSGeo/gdal

DahnJ commented 2 years ago

I see. Thank you for the help, I'm closing this issue.