Open RY4GIT opened 1 year ago
Did you try the rioxarray
engine instead of rasterio
?
@dcherian Sorry I realized I should have posted this in rasterio repo. I was confused rasterio and rioxarray! I will try them out and let you know soon, but you may close the issue if you think this doesn't fit here.
No! This is the place. Just try engine="rioxarray"
. engine="rasterio"
is bundled with xarray and is about be removed. rioxarray
is a more full-featured version of the same funcitonality
Actually, engine="rasterio"
comes from rioxarray ref. It calls rioxarray.open_rasterio
.
Oh that's confusing! In that case, i guess it's calling rioxarray
since its installed.
So I tried open_rasterio with NSIDC output, but the results are the same. The issue did not happen with data directly downloaded from NASA Earth data (see updates on https://github.com/pydata/xarray/issues/7621), so I am not sure it's an issue with rioxarray or the NSIDC output.
ds_NSIDC_rioxarray = rioxarray.open_rasterio(os.path.join(input_path, fn_NSIDC_output))
ds_NSIDC_rioxarray = ds_NSIDC_rioxarray.set_coords(["cell_lat", "cell_lon"])
ds_NSIDC_rioxarray
ds_NSIDC_rioxarray.precipitation_total_surface_flux[0].plot(y="cell_lat", x="cell_lon", vmin=-0.001)
If you do this:
rds = rioxarray.open_rasterio(os.path.join(input_path, fn_NSIDC_output), group="Geophysical_Data")
or this:
rds = xarray.open_dataset(os.path.join(input_path, fn_NSIDC_output), group="Geophysical_Data", engine="rasterio")
The data plots correctly with rasterio.
EDIT: This does not fix the issue. The plot has the correct orientation. However, the data is missing coordinates.
If you do this, the lat/lon data looks correct:
ds_NSIDC_root = xarray.open_dataset(data_path, variable=["cell_lon", "cell_lat"], engine='rasterio')
ds_NSIDC_root.cell_lat.plot()
This appears to be a rasterio/GDAL issue:
import rasterio
import netCDF4
import numpy
with netCDF4.Dataset(data_path) as nfh:
nc_data = nfh['Geophysical_Data']['precipitation_total_surface_flux'][:]
with rasterio.open('netcdf:SMAP_L4_SM_gph_20150331T013000_Vv7032_001_HEGOUT.nc:/Geophysical_Data/precipitation_total_surface_flux') as rfh:
rio_data = rfh.read(1, masked=True)
numpy.array_equal(nc_data, rio_data)
False
numpy.array_equal(nc_data, numpy.flip(rio_data, 0))
True
This is not something that can be resolved in rioxarray.
Hi,
I'm just found this issue while searching for a related problem. Thank you for documenting it so well.
I'm actually working on a tutorial for NSIDC to demonstrate working with SMAP data. So this is quite timely.
I do not think it is a software/tool problem but how you are trying to read the data. I'm using the original HDF5 file from NSIDC but the result is the same. I'm guessing you used a reprojection service to get the dataset you posted here.
TLDR: I think the problem is that rasterio
does not interpret the coordinate information and treats the data as an image - which is what rasterio
is for. If you want the features offered by rioxarray
you can use:
ds_rioxarray = xr.merge([
xr.open_dataset(filepath, decode_coords="all"),
xr.open_dataset(filepath, group='Geophysical_Data', decode_coords="all")['precipitation_total_surface_flux'],
])
I'm using the following packages
from pathlib import Path
import xarray as xr
import rioxarray
print(xr.__version__)
print(rioxarray.__version__)
2024.2.0
0.15.0
filepath = Path("smap_data/SMAP_L4_SM_gph_20150331T013000_Vv7032_001.h5")
The SMAP datasets you are using are either HDF5 (the original file from NSIDC) or NetCDF4 and follow most of the CF-Conventions, at least for the Level-4 data. So you can use
engine = "netcdf4"
ds = xr.merge([
xr.open_dataset(filepath, engine=engine),
xr.open_dataset(filepath, group='Geophysical_Data', engine=engine)['precipitation_total_surface_flux'],
])
ds
Or
engine="h5netcdf"
ds = xr.merge([
xr.open_dataset(filepath, engine=engine, phony_dims="sort"),
xr.open_dataset(filepath, group='Geophysical_Data', engine=engine,
phony_dims="sort")['precipitation_total_surface_flux'],
])
ds
These both return an xarray dataset like
<xarray.Dataset> Size: 175MB
Dimensions: (phony_dim_2: 1, y: 1624, x: 3856)
Coordinates:
* x (x) float64 31kB -1.736e+07 ... 1.736e+07
* y (y) float64 13kB 7.31e+06 ... -7.31e+06
Dimensions without coordinates: phony_dim_2
Data variables:
EASE2_global_projection (phony_dim_2) |S1 1B ...
cell_column (y, x) float64 50MB ...
cell_lat (y, x) float32 25MB ...
cell_lon (y, x) float32 25MB ...
cell_row (y, x) float64 50MB ...
time (phony_dim_2) datetime64[ns] 8B ...
precipitation_total_surface_flux (y, x) float32 25MB ...
Attributes:
Comment: HDF-5
Contact: http://gmao.gsfc.nasa.gov/
Conventions: CF
Filename: /discover/nobackup/projects/gmao/smap/SMAP_L4/L4_SM/Vv7032/...
History: File written by ldas2daac.x
Institution: NASA Global Modeling and Assimilation Office
References: see SMAP L4_SM Product Specification Documentation
Source: v17.11.1
Title: SMAP L4_SM Geophysical (GPH) Data Granule
Note that the x
and y
coordinates are projected-coordinates for the EASE-Grid v2.0 Global CRS. In your example they are geographic coordinates.
What I think is happening when you use either the rasterio
backend or rioxarray
is that the HDF5 GDAL driver (which is what rasterio
is using under-the-hood) is not able to parse the coordinate information.
engine = "rasterio"
ds_rasterio = xr.open_dataset(filepath, engine=engine)
This returns a bunch of warnings about not being able to determine georeferencing
[/home/apbarret/mambaforge/envs/nsidc-tutorials/lib/python3.9/site-packages/rioxarray/_io.py:1132](http://localhost:8890/home/apbarret/mambaforge/envs/nsidc-tutorials/lib/python3.9/site-packages/rioxarray/_io.py#line=1131): NotGeoreferencedWarning: Dataset has no geotransform, gcps, or rpcs. The identity matrix will be returned.
warnings.warn(str(rio_warning.message), type(rio_warning.message)) # type: ignore
Printing x
and y
shows that the data are not georeferenced because the grid cell column and row indices - image coordinates - are returned not the projected coordinates.
print(ds_rasterio.x, rasterio.y)
<xarray.DataArray 'x' (x: 3856)> Size: 31kB
array([5.0000e-01, 1.5000e+00, 2.5000e+00, ..., 3.8535e+03, 3.8545e+03,
3.8555e+03])
Coordinates:
* x (x) float64 31kB 0.5 1.5 ... 3.854e+03 3.856e+03
EASE2_global_projection int64 8B ... <xarray.DataArray 'y' (y: 1624)> Size: 13kB
array([5.0000e-01, 1.5000e+00, 2.5000e+00, ..., 1.6215e+03, 1.6225e+03,
1.6235e+03])
Coordinates:
* y (y) float64 13kB 0.5 1.5 ... 1.622e+03 1.624e+03
EASE2_global_projection int64 8B ...
I suspect that rasterio
is treating the data as an image, which by convention have the origin of image coordinates at the upper-left corner of the upper-left pixel. So pixel[0.5, 0.5]
is the upper-left-most pixel center. However, when you plot the data with the xarray.DataArray.plot
method, the coordinate sytem that pcolormesh
(the default plotting method) uses is a normal cartesian coordinate system with the origin at the lower left, so the image appears upside down.
Plotting cell_lat
shows that this is the case, southern hemisphere grid-cells are plotted at the top of the image.
ds_rasterio.cell_lat.plot()
Using ds_rasterio.cell_lat.squeeze().plot.imshow(origin="upper")
to plot cell lat, which sets the origin to be the upper-left cell, gives the expected result: postive latitudes at the top of the image.
If you want to take advantage of the accessors and methods provided by rioxarray
then you need to import rioxarray
and set decode_coords="all"
. _Note: this throws a warning because there is no coordinate information in the Geophysical_Data group and xarray will only look in the group specified. IMO I don't think nested groups are necessary for these kinds of high-level products. Merging the root and precipitation datasets retains the georeferencing from the first call to open_dataset
._
ds_rioxarray = xr.merge([
xr.open_dataset(filepath, decode_coords="all"),
xr.open_dataset(filepath, group='Geophysical_Data', decode_coords="all")['precipitation_total_surface_flux'],
])
print(ds_rioxarray)
<xarray.Dataset> Size: 175MB
Dimensions: (phony_dim_2: 1, y: 1624, x: 3856)
Coordinates:
EASE2_global_projection (phony_dim_2) |S1 1B ...
* x (x) float64 31kB -1.736e+07 ... 1.736e+07
* y (y) float64 13kB 7.31e+06 ... -7.31e+06
Dimensions without coordinates: phony_dim_2
Data variables:
cell_column (y, x) float64 50MB ...
cell_lat (y, x) float32 25MB ...
cell_lon (y, x) float32 25MB ...
cell_row (y, x) float64 50MB ...
time (phony_dim_2) datetime64[ns] 8B ...
precipitation_total_surface_flux (y, x) float32 25MB ...
Attributes:
Source: v17.11.1
Institution: NASA Global Modeling and Assimilation Office
History: File written by ldas2daac.x
Comment: HDF-5
Filename: /discover/nobackup/projects/gmao/smap/SMAP_L4/L4_SM/Vv7032/...
Title: SMAP L4_SM Geophysical (GPH) Data Granule
Conventions: CF
References: see SMAP L4_SM Product Specification Documentation
Contact: http://gmao.gsfc.nasa.gov/
[/tmp/ipykernel_230171/3642432009.py:3](http://localhost:8890/tmp/ipykernel_230171/3642432009.py#line=2): UserWarning: Variable(s) referenced in grid_mapping not in variables: ['EASE2_global_projection']
xr.open_dataset(filepath, group='Geophysical_Data', decode_coords="all")['precipitation_total_surface_flux'],
ds_rioxarray.precipitation_total_surface_flux.plot(vmax=0.001)
@snowman2 I'm seeing the same behaviour
notebook: https://gist.github.com/vincentsarago/81f1e849ca8cef938a342ac49a2a1d49
I'm wondering if there is something going on with how the transform is calculated here https://github.com/corteva/rioxarray/blob/f3ad955dd59dfdd1a696ac9d4a4b5be1418c5994/rioxarray/rioxarray.py#L702-L709
to me it seems that the transform should be
src_resolution_x, src_resolution_y = da.rio.resolution()
src_left, _, _, src_top = da.rio.bounds(recalc=True)
transform = Affine.translation(src_left, src_top) * Affine.scale(
src_resolution_x, -src_resolution_y
)
at least for the file we are using the _unordered_bounds
will have src_top
be the bottom latitude 🤷 , so basically the origin for the transform will be the bottom left (instead of the top left in rasterio/gdal).
@vincentsarago I believe that the transform in rioxarray should be equivalent. Do you have an example file you can share?
@snowman2 you can get files using
import earthaccess
import xarray
earthaccess.login()
results = earthaccess.search_data(
concept_id="C2036881735-POCLOUD", count=1, temporal=("2024-09-20", "2024-09-20")
)
fp = earthaccess.download(results, "earthaccess_data")[0]
ds = xarray.open_dataset(fp, decode_coords="all")
you can also check this file https://dap.ceda.ac.uk/neodc/esacci/land_surface_temperature/data/MULTISENSOR_IRCDR/L3S/0.01/v2.00/monthly/2020/11/ESACCI-LST-L3S-LST-IRCDR_-0.01deg_1MONTHLY_DAY-20201101000000-fv2.00.nc
(Note this file has also the overflow
bounds issue, with bounds < -90 😓)
import rioxarray
import xarray
import fsspec
src_path = "https://dap.ceda.ac.uk/neodc/esacci/land_surface_temperature/data/MULTISENSOR_IRCDR/L3S/0.01/v2.00/monthly/2020/11/ESACCI-LST-L3S-LST-IRCDR_-0.01deg_1MONTHLY_DAY-20201101000000-fv2.00.nc"
filesystem = fsspec.filesystem("https")
fp = filesystem.open(src_path)
ds = xarray.open_dataset(fp, decode_coords="all", engine="h5netcdf")
da = ds["lst"]
print(da.rio.bounds())
print(da.rio._unordered_bounds())
print(da.rio.transform())
(-179.99999511705187, -90.00000274631076, 179.99999511705187, 89.9999874875217)
(-179.99999511705187, 89.9999874875217, 179.99999511705187, -90.00000274631076)
| 0.01, 0.00,-180.00|
| 0.00, 0.01,-90.00|
| 0.00, 0.00, 1.00|
note we've added a fix
in rio-tiler to flip vertically the output array when encountering this case https://github.com/cogeotiff/rio-tiler/blob/9a39c802044b485a26c29065b539ec44c817d8a6/rio_tiler/io/xarray.py#L329-L332
we've added a fix in rio-tiler to flip vertically the output array when encountering this case
That sounds like a good solution to me :+1:
Update
Update on March 15, 2023 based on discussion here https://github.com/pydata/xarray/issues/7621 First posted on March 13, 2023
Description
When reading data with Xarray using a rasterio engine, the Y axis is flipped. This happens with a specific dataset, and the summary is as follows.
i.e., The issue happens with SMAP L4 data, pre-processed on the NSIDC server, read with rasterio engine. The issue does not happen with NetCDF4 engine. It did not happen with the one directly downloaded from NASA Earth Data.
When reading data with Xarray using both the netcdf4 engine and the rasterio engine, the Y axis is flipped. However, when reading the same data with the netcdf4 package, the issue doesn't seem to be occurring.It's unclear whether the issue is with Xarray or the data itself.
Details
I have created a Jupyter notebook with the details of the issue, including the code used to analyze the data, which is available here
Data
for different versions of the dataset andfiles for any dates or times. I tested it with two different versions of datasets.The data can be downloaded from the links provided by NASA and NSIDC (the links are specified in the notebook), or from my Google Drive.
Environment Information
Installation method
Please let me know if you need any further information or have any suggestions on resolving this issue.
I am cross-posting this Xarray issues.
Please let me know if you need any further information or have any suggestions on resolving this issue. Thank you!