Open hrodmn opened 6 days ago
After digging into the issue some more I think the problem is in the default CRS definition for non-georeferenced datasets in titiler-cmr: https://github.com/developmentseed/titiler-cmr/blob/afe8b6dee5d2202fd8213080f986f3b66125d8b6/titiler/cmr/reader.py#L200-L201
The dataset that works as expected (GAMSSA) contains a georeference while the dataset that is not working correctly (MUR SST) does not contain a georeference and has it manually set to "epsg:4326"
. If I set the CRS using the CRS WKT from the GAMSSA dataset (instead of "epsg:4326"
, the MUR SST dataset behaves correctly.
diff --git a/titiler/cmr/reader.py b/titiler/cmr/reader.py
index 979629b..5e0af7d 100644
--- a/titiler/cmr/reader.py
+++ b/titiler/cmr/reader.py
@@ -197,7 +197,10 @@ def get_variable(
da = da.sortby(da.x)
# Make sure we have a valid CRS
- crs = da.rio.crs or "epsg:4326"
+ crs = (
+ da.rio.crs
+ or 'GEOGCS["undefined",DATUM["undefined",SPHEROID["undefined",6379137,298.257223563]],PRIMEM["Greenwich",0,AUTHORITY["EPSG","8901"]],UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9122"]],AXIS["Longitude",EAST],AXIS["Latitude",NORTH]]'
+ )
da.rio.write_crs(crs, inplace=True)
This does not explain why everything works when a reprojection operation is performed without any changes to the default CRS, but it suggests that defaulting to epsg:4326
is probably not the right thing.
rioxarray will set the Affine transformation via the crs attribute and coordinates if it is not already defined in the dataset, which means the data gets "flipped" properly to match the destination transform in the reprojection step. rio_tiler's XarrayReader skips the reprojection entirely if the source and destination crs are the same - https://github.com/cogeotiff/rio-tiler/blob/abec15162c1d8cf9dd67cb5945d8532e2b6f562c/rio_tiler/io/xarray.py#L249-L263. If the bbox
endpoint should always return an image that's oriented with y coordinates positive upwards, I'd guess this should be implemented at the rio-tiler
level π€
I think the automatic assignment of a CRS is a broader issue. I don't agree with assuming that it's "epsg:4326" if it's not in the metadata. I guess there are a lot of datasets that are missing that metadata, so there should be a way for users to get around that still. Maybe a query parameter in the titiler-cmr endpoints?
After more investigation with @maxrjones and others, the problem is actually pretty simple. Neither of these datasets follow the convention of y axis coordinates positive upwards in the arrays.
The GAMSSA dataset has a georeference attached to it that rioxarray
can find, and when rio-tiler checks to see if the CRS matches the dest_crs
, it sees that they are not equal and performs a reprojection. This reprojection operation yields an array that does follow the conventional y-coordinate direction.
The MUR SST dataset does not have a georeference attached to it and gets the "epsg:4326"
value hard-coded. In the part
method, rio-tiler checks to see if self.input.rio.crs
matches dest_crs
(default: "epsg:4326"
) and they do match so no reprojection operation is performed. This means the array is never transformed to match conventional expectation of y-coordinates.
If the bbox endpoint should always return an image that's oriented with y coordinates positive upwards, I'd guess this should be implemented at the rio-tiler level π€
We should propose a fix in rio-tiler to ensure the array that is passed to ImageData
matches expectation of y axis coordinates positive upwards.
I think the automatic assignment of a CRS is a broader issue. I don't agree with assuming that it's "epsg:4326" if it's not in the metadata. I guess there are a lot of datasets that are missing that metadata, so there should be a way for users to get around that still. Maybe a query parameter in the titiler-cmr endpoints?
Yes I agree. We should add a query parameter for src_crs
that would allow an application to provide a CRS definition for datasets that do not have one attached, then raise an exception if the dataset is missing a CRS and src_crs
is not supplied.
Here's the roughly equivalent rio_tiler
code showing the inconsistent behavior between the Rasterio and Xarray readers:
import earthaccess
import rio_tiler
import xarray as xr
import matplotlib.pyplot as plt
from odc.geo.geom import Geometry
from shapely.geometry import box
# Download dataset
earthaccess.login()
variable = "analysed_sst"
results = earthaccess.search_data(
concept_id="C2036881735-POCLOUD", count=1, temporal=("2024-09-20", "2024-09-20")
)
fp = earthaccess.download(results, "earthaccess_data")[0]
# Set up GeoJSON for reader
bbox = box(-99.524, 19.043, -78.647, 31.126)
gbox = Geometry(bbox, "EPSG:4326")
shape = gbox.geojson()
# Use Xarray reader
ds = xr.open_dataset(fp)
ds = ds.rio.write_crs("EPSG:4326")
r = rio_tiler.io.xarray.XarrayReader(ds[variable])
im = r.feature(shape)
plt.imshow(im.data.squeeze())
plt.colorbar()
plt.show()
# Use rasterio reader
r = rio_tiler.io.rasterio.Reader(f"NETCDF:{str(fp)}:analysed_sst")
im = r.feature(shape)
plt.imshow(im.data.squeeze())
plt.colorbar()
plt.show()
I'm not sure where mask_and_scale and masking gets handled with using the rasterio reader, which is the reason for the difference in values. I could open a PR tomorrow suggesting a change to the Xarray reader to return consistent array structure.
I'm not sure where mask_and_scale and masking gets handled with using the rasterio reader, which is the reason for the difference in values. I could open a PR tomorrow suggesting a change to the Xarray reader to return consistent array structure.
@maxrjones can you try plt.imshow(im.data_as_image())
instead of plt.imshow(im.data.squeeze())
import earthaccess
import rio_tiler
import xarray as xr
import matplotlib.pyplot as plt
from geojson_pydantic import Polygon
# Download dataset
earthaccess.login()
variable = "analysed_sst"
results = earthaccess.search_data(
concept_id="C2036881735-POCLOUD", count=1, temporal=("2024-09-20", "2024-09-20")
)
fp = earthaccess.download(results, "earthaccess_data")[0]
# Set up GeoJSON for reader
bbox = (-99.524, 19.043, -78.647, 31.126)
shape = Polygon.from_bounds(*bbox).model_dump(exclude_none=True)
# Use Xarray reader
print("xarray")
ds = xr.open_dataset(fp)
ds = ds.rio.write_crs("EPSG:4326")
r = rio_tiler.io.xarray.XarrayReader(ds[variable])
im = r.feature(shape)
plt.imshow(im.data_as_image())
plt.colorbar()
plt.show()
print("rasterio")
# Use rasterio reader
r = rio_tiler.io.rasterio.Reader(f"NETCDF:{str(fp)}:analysed_sst")
im = r.feature(shape)
plt.imshow(im.data_as_image())
plt.colorbar()
plt.show()
Thank you both for digging on this issue π
I think the automatic assignment of a CRS is a broader issue. I don't agree with assuming that it's "epsg:4326" if it's not in the metadata. I guess there are a lot of datasets that are missing that metadata, so there should be a way for users to get around that still. Maybe a query parameter in the titiler-cmr endpoints?
I totally agree π
Yes I agree. We should add a query parameter for src_crs that would allow an application to provide a CRS definition for datasets that do not have one attached, then raise an exception if the dataset is missing a CRS and src_crs is not supplied.
I'm a bit worry about passing use WKT crs via query parameter, but I don't see better solution
if we agree that not setting a CRS automatically in the XarrayReader, then might get issue when using titiler-cmr backend because we won't be able to easily forward the Error message up to the users π¬
ok I think I may have a solution
import xarray
ds = xarray.open_dataset("20241012090000-JPL-L4_GHRSST-SSTfnd-MUR-GLOB-v02.0-fv04.1.nc", decode_coords="all")
da = ds["analysed_sst"]
print(da.rio.bounds())
print(da.rio.crs)
print(da.rio.transform())
print(da.rio._unordered_bounds())
(-179.99500549324037, -89.99499786365084, 180.0050000000763, 89.99499786365084)
None
| 0.01, 0.00,-180.00|
| 0.00, 0.01,-89.99|
| 0.00, 0.00, 1.00|
(-179.99500549324037, 89.99499786365084, 180.0050000000763, -89.99499786365084)
if you look at the tranform
the y resolution is positive, while the bounds
(form rio.bounds()) indicate that the transform Y resolution should be negative
rioxarray reorder the bounds, so if you look at the difference between da.rio._unordered_bounds()
and da.rio.bounds()
this might indicate that the transform should have a -Y resolution
This is another case where the vrt://
URI syntax shines. What if we could just use the rasterio
backend for all of these NetCDF datasets? I don't know how to specify a vrt://
URI in a 4-D (lon, lat, time, variable) where you need to select a time slice AND a variable, but it might be possible (I haven't found an example dataset with that format yet).
This /vsicurl
operation takes ~23 seconds on my laptop.
import earthaccess
import matplotlib.pyplot as plt
import rio_tiler
# query MUR SST from CMR
earthaccess.login()
results = earthaccess.search_data(
concept_id="C1996881146-POCLOUD", count=1, temporal=("2024-10-22", "2024-10-22")
)
url = results[0].data_links()[0]
# Use rasterio reader to read a bbox
bbox = (-99.524, 19.043, -78.647, 31.126)
variable = "analysed_sst"
r = rio_tiler.io.rasterio.Reader(f"vrt:///vsicurl/{url}?sd_name={variable}")
im = r.part(bbox)
plt.imshow(im.data_as_image())
plt.show()
Does the XarrayReader
have some performance advantage over the GDAL reader in this case?
I don't know how to specify a
vrt://
URI in a 4-D (lon, lat, time, variable) where you need to select a time slice AND a variable, but it might be possible (I haven't found an example dataset with that format yet).
This example file has four dimensions: https://www.unidata.ucar.edu/software/netcdf/examples/tos_O1_2001-2002.nc
You can pick the variable with sd_name={variable}
and a time slice with bands={i}
.
gdalinfo "vrt:///vsicurl/https://www.unidata.ucar.edu/software/netcdf/examples/tos_O1_2001-2002.nc?sd_name=tos&bands=1"
I don't know how to specify a
vrt://
URI in a 4-D (lon, lat, time, variable) where you need to select a time slice AND a variable, but it might be possible (I haven't found an example dataset with that format yet).This example file has four dimensions: unidata.ucar.edu/software/netcdf/examples/tos_O1_2001-2002.nc
You can pick the variable with
sd_name={variable}
and a time slice withbands={i}
.gdalinfo "vrt:///vsicurl/https://www.unidata.ucar.edu/software/netcdf/examples/tos_O1_2001-2002.nc?sd_name=tos&bands=1"
How would this work with rio-tiler?
I tried:
import matplotlib.pyplot as plt
import rio_tiler
# Use rasterio reader to read a bbox
bbox = (-99.524, 19.043, -78.647, 31.126)
src = "vrt:///vsicurl/https://www.unidata.ucar.edu/software/netcdf/examples/tos_O1_2001-2002.nc?sd_name=tos&bands=1"
r = rio_tiler.io.rasterio.Reader(src)
im = r.part(bbox)
plt.imshow(im.data_as_image())
plt.show()
You shouldn't need the /vsicurl/ prefix with rasterio
You shouldn't need the /vsicurl/ prefix with rasterio
Good to know, thanks! The bands=
trick via VRT wasn't working for me, but it was possible to use indexes=1
with Reader.part()
. Still, I'm not sure how you would go from an application/user specified time stamp to an index without a tool like xarray / rioxarray.
import matplotlib.pyplot as plt
import rio_tiler
# Use rasterio reader to read a bbox
bbox = (190, 19.043, 240, 31.126)
src = "https://www.unidata.ucar.edu/software/netcdf/examples/tos_O1_2001-2002.nc?sd_name=tos&bands=1"
r = rio_tiler.io.rasterio.Reader(src)
im = r.part(bbox, indexes=1)
plt.imshow(im.data_as_image())
@hrodmn FYI while that dataset has 4 dimensions, the tos variable only has three. concept_ID="G1991303139-POCLOUD"
in an example of a dataset with 4 dimensional data variables (time: 1, Z: 50, latitude: 360, longitude: 720
).
I'm not sure if this is a problem in titiler-cmr or somewhere upstream (rio-tiler) but when reading images from some NetCDF files without doing a reprojection operation, the images are returned upside down!
This request to one NetCDF dataset returns the expected result:
https://9ox7r6pi8c.execute-api.us-west-2.amazonaws.com/bbox/-99.524,19.043,-78.647,31.126.png?concept_id=C2036881735-POCLOUD&datetime=2024-09-20T00:00:00Z/2024-09-21T00:00:00Z&variable=analysed_sst&backend=xarray&rescale=273,325&colormap_name=thermal&return_mask=true
A request to a different NetCDF dataset (MUR SST) with all of the same parameters yields an upside down image!
https://9ox7r6pi8c.execute-api.us-west-2.amazonaws.com/bbox/-99.524,19.043,-78.647,31.126.png?concept_id=C1996881146-POCLOUD&datetime=2024-09-20T00:00:00Z/2024-09-21T00:00:00Z&variable=analysed_sst&backend=xarray&rescale=273,325&colormap_name=thermal&return_mask=true
/tile
requests to the second dataset return the expected result:https://9ox7r6pi8c.execute-api.us-west-2.amazonaws.com/tiles/WebMercatorQuad/5/8/13.png?concept_id=C1996881146-POCLOUD&datetime=2024-09-20T00:00:00Z/2024-09-21T00:00:00Z&variable=analysed_sst&backend=xarray&rescale=273,325&colormap_name=thermal&return_mask=true
When I specify a
dst_crs
that is not the dataset's native CRS,/bbox
produces a correct image:https://9ox7r6pi8c.execute-api.us-west-2.amazonaws.com/bbox/-99.524,19.043,-78.647,31.126.png?concept_id=C1996881146-POCLOUD&datetime=2024-09-20T00:00:00Z/2024-09-21T00:00:00Z&variable=analysed_sst&backend=xarray&rescale=273,325&colormap_name=thermal&return_mask=true&dst_crs=epsg:3857