corteva / rioxarray

geospatial xarray extension powered by rasterio
https://corteva.github.io/rioxarray
Other
530 stars 85 forks source link

Concat of multiple DataArrays reverses the y dimension #633

Open ZachHoppinen opened 1 year ago

ZachHoppinen commented 1 year ago

What is your issue?

Hey all,

I suspect I am just doing something incorrectly but haven't been able to solve this problem, so hoping someone can point me in the right direction.

I have a list of two data arrays with y dimensions (high -> low) of:

# check y values of the first DataArray
dataArrays[0].y

array([43.099756, 43.099413, 43.09907 , ..., 43.001019, 43.000676, 43.000333])

# check y values of the second DataArray
dataArrays[1].y

array([43.099682, 43.09935 , 43.099019, ..., 43.000869, 43.000537, 43.000206])

and when I try to concatenate this list into a DataSet, the resulting y dimension is reversed from my expectation. Actual y values (low -> high):

# combine list of dataArrays and check y- values of dataset
ds = xarray.concat(dataArrays, dim = 'time')
ds.y

array([43.000206, 43.000333, 43.000537, ..., 43.099413, 43.099682, 43.099756])

Expected y-values (high -> low) would match the ordering of the input DataArrays:

ds = xarray.concat(dataArrays, dim = 'time')
ds.y

array([43.099756, 43.099682, 43.099413, ..., 43.000537, 43.000333,43.000206])

How can I combine these DataArrays and get the same direction of ordering of y-values as the input arrays? When I plot the data array vs a band of the dataset they look alright but when I check the bounds of the dataset the latitudes are reversed with a smaller maximum latitude than minimum latitude in the concatenated dataset:

dataArrays[0].rio.bounds()

(-114.40003329796458, 43.000039804222375, -114.29989382915689, 43.099847685385)

ds.isel(time = 0)

(-114.40003892276228,43.099584649180755,-114.29988008526641, 43.00037701666514) # smaller top latitude, 43.00037701666514, than bottom latitude, 43.099584649180755.

This smaller top latitude than bottom latitude leads to the error: raise WindowError("Bounds and transform are inconsistent") when i try and get the bounds of the dataset.

I have also tried using merge_datasets and it won't concat along the time dimension so I end up with just 1 time value on the time dimension.

Thanks in advance, and sorry if this isn't the right format or place for this.


Rioxarray version information:

rioxarray (0.13.3) deps: rasterio: 1.3.4 xarray: 2023.1.0 GDAL: 3.6.2 GEOS: 3.11.1 PROJ: 9.1.0 PROJ DATA: /Users/zachkeskinen/miniconda3/envs/spicy/share/proj GDAL DATA: /Users/zachkeskinen/miniconda3/share/gdal

Other python deps: scipy: 1.10.0 pyproj: 3.4.1

System: python: 3.11.0 | packaged by conda-forge | (main, Jan 15 2023, 05:44:48) [Clang 14.0.6 ] executable: /Users/zachkeskinen/miniconda3/envs/spicy/bin/python machine: macOS-12.6.2-x86_64-i386-64bit

snowman2 commented 1 year ago

Would you mind providing a minimal reproducible example and the output of rioxarray.show_versions()?

ZachHoppinen commented 1 year ago

Hello Snowman2,

Thanks for getting back to me. For rioxarray.show_versions() do you mean something other than the included text at the bottom of my original comment?

I tried to reproduce this problem with two created data arrays, but in that case, the bounds seemed to work, so I was hoping someone could point me in the right direction for why my data which should have the same CRS and dataset up is showing a minimum y-value bound that is larger than the maximum y-value bound.

import numpy as np
import xarray as xr
import rioxarray

temperature = np.random.randn(1, 2, 2)
x = [10, 30]
y = [4, 2]
time1 = ['sept-1']

da1 = xr.DataArray(temperature, coords = [time1, x, y],dims = ['time', 'x', 'y'])
da1 = da1.rio.write_crs('EPSG:4326')

temperature = np.random.randn(1, 2, 2)
x = [20, 40]
y = [3, 1]
time2 = ['sept-2']

da2 = xr.DataArray(temperature, coords = [time2, x, y], dims = ['time', 'x', 'y'])
da2 = da2.rio.write_crs('EPSG:4326')

ds = xr.concat([da1, da2], dim = 'time')
print(ds.rio.bounds())

Which gives the expected values of a y-minimum value that is smaller than the y-maximum value.