corteva / rioxarray

geospatial xarray extension powered by rasterio
https://corteva.github.io/rioxarray
Other
507 stars 80 forks source link

Allow user assigned CRS in open_rasterio #646

Closed peterm790 closed 1 year ago

peterm790 commented 1 year ago

I am looking to open MODIS level 2 hdf data, sourced from here. I've also uploaded a sample file to s3://peterm790/MOD11_L2.A2023060.2040.061.2023061090549.hdf for convenience.

My understanding is after https://github.com/corteva/rioxarray/pull/62 it should be possible to open these files by simply calling:

rds = rioxarray.open_rasterio('/path/to/MOD11_L2.A2023060.2040.061.2023061090549.hdf')

This however returns the following error:

MissingCRS: CRS not found. Please set the CRS with 'rio.write_crs()'.

a gist with the full error message: https://gist.github.com/peterm790/6a3dd266def15b2a5bfa04a3c63825a6

My understanding is the present API does not allow for manually setting the CRS when opening a file. Is there a way around this? if not does it make sense to allow for this? Or should it be possible to open the file without setting the CRS?

I have set up a branch: https://github.com/corteva/rioxarray/compare/master...peterm790:rioxarray:assign_crs which allows the CRS to be set manually with an assign_crs argument in open_rasteriosuch that:

rds = rioxarray.open_rasterio('/path/to/MOD11_L2.A2023060.2040.061.2023061090549.hdf', assign_crs = "+proj=sinu +lon_0=0 +x_0=0 +y_0=0 +a=6371007.181 +b=6371007.181 +units=m +no_defs")

successfully opens the files. here's a gist of this https://gist.github.com/peterm790/f06d846e518c2ced7741afcda5365aed

Please do let me know if there is another method for opening these MODIS L2 files that I have missed? Or if a PR allowing the above change would make sense.

snowman2 commented 1 year ago

What is the output of rioxarray.show_versions()?

snowman2 commented 1 year ago

Also, mind providing the full error traceback?

peterm790 commented 1 year ago

rioxarray.show_versions():

rioxarray (0.13.3) deps:
  rasterio: 1.3.4
    xarray: 2023.1.0
      GDAL: 3.6.2
      GEOS: 3.11.1
      PROJ: 9.1.0
 PROJ DATA: /share/apps/anaconda3-2022.05/envs/pangeo/share/proj
 GDAL DATA: None

Other python deps:
     scipy: 1.10.0
    pyproj: 3.4.1

System:
    python: 3.10.9 | packaged by conda-forge | (main, Feb  2 2023, 20:20:04) [GCC 11.3.0]
executable: /share/apps/anaconda3-2022.05/envs/pangeo/bin/python
   machine: Linux-3.10.0-1160.80.1.el7.x86_64-x86_64-with-glibc2.17
peterm790 commented 1 year ago
MissingCRS                                Traceback (most recent call last)
Cell In[5], line 2
      1 import rioxarray
----> 2 rds = rioxarray.open_rasterio(files[10], masked = True)

File /share/apps/anaconda3-2022.05/envs/pangeo/lib/python3.10/site-packages/rioxarray/_io.py:1115, in open_rasterio(filename, parse_coordinates, chunks, cache, lock, masked, mask_and_scale, variable, group, default_name, decode_times, decode_timedelta, band_as_variable, **open_kwargs)
   1113 # open the subdatasets if they exist
   1114 if riods.subdatasets:
-> 1115     subdataset_result = _load_subdatasets(
   1116         riods=riods,
   1117         group=group,
   1118         variable=variable,
   1119         parse_coordinates=parse_coordinates,
   1120         chunks=chunks,
   1121         cache=cache,
   1122         lock=lock,
   1123         masked=masked,
   1124         mask_and_scale=mask_and_scale,
   1125         decode_times=decode_times,
   1126         decode_timedelta=decode_timedelta,
   1127         **open_kwargs,
   1128     )
   1129     manager.close()
   1130     return subdataset_result

File /share/apps/anaconda3-2022.05/envs/pangeo/lib/python3.10/site-packages/rioxarray/_io.py:791, in _load_subdatasets(riods, group, variable, parse_coordinates, chunks, cache, lock, masked, mask_and_scale, decode_times, decode_timedelta, **open_kwargs)
    789     with rasterio.open(subdataset) as rds:
    790         shape = rds.shape
--> 791     rioda: DataArray = open_rasterio(  # type: ignore
    792         subdataset,
    793         parse_coordinates=shape not in dim_groups and parse_coordinates,
    794         chunks=chunks,
    795         cache=cache,
    796         lock=lock,
    797         masked=masked,
    798         mask_and_scale=mask_and_scale,
    799         default_name=subdataset.split(":")[-1].lstrip("/").replace("/", "_"),
    800         decode_times=decode_times,
    801         decode_timedelta=decode_timedelta,
    802         **open_kwargs,
    803     )
    804     dim_groups[shape][rioda.name] = rioda
    805 return _subdataset_groups_to_dataset(
    806     dim_groups=dim_groups, global_tags=_parse_tags(riods.tags())
    807 )

File /share/apps/anaconda3-2022.05/envs/pangeo/lib/python3.10/site-packages/rioxarray/_io.py:1224, in open_rasterio(filename, parse_coordinates, chunks, cache, lock, masked, mask_and_scale, variable, group, default_name, decode_times, decode_timedelta, band_as_variable, **open_kwargs)
   1222     result.rio.write_crs(rio_crs, inplace=True)
   1223 if has_gcps:
-> 1224     result.rio.write_gcps(*riods.gcps, inplace=True)
   1226 if chunks is not None:
   1227     result = _prepare_dask(result, riods, filename, chunks)

File /share/apps/anaconda3-2022.05/envs/pangeo/lib/python3.10/site-packages/rioxarray/rioxarray.py:1230, in XRasterBase.write_gcps(self, gcps, gcp_crs, grid_mapping_name, inplace)
   1225 grid_mapping_name = (
   1226     self.grid_mapping if grid_mapping_name is None else grid_mapping_name
   1227 )
   1228 data_obj = self._get_obj(inplace=True)
-> 1230 data_obj = data_obj.rio.write_crs(
   1231     gcp_crs, grid_mapping_name=grid_mapping_name, inplace=inplace
   1232 )
   1233 geojson_gcps = _convert_gcps_to_geojson(gcps)
   1234 data_obj.coords[grid_mapping_name].attrs["gcps"] = geojson_gcps

File /share/apps/anaconda3-2022.05/envs/pangeo/lib/python3.10/site-packages/rioxarray/rioxarray.py:487, in XRasterBase.write_crs(self, input_crs, grid_mapping_name, inplace)
    484     pass
    486 if data_obj.rio.crs is None:
--> 487     raise MissingCRS(
    488         "CRS not found. Please set the CRS with 'rio.write_crs()'."
    489     )
    490 # add grid mapping coordinate
    491 data_obj.coords[grid_mapping_name] = xarray.Variable((), 0)

MissingCRS: CRS not found. Please set the CRS with 'rio.write_crs()'.
snowman2 commented 1 year ago

This is interesting - it seems to think there are GCPS, but no CRS is found:

if has_gcps:
    result.rio.write_gcps(*riods.gcps, inplace=True)
snowman2 commented 1 year ago
with rasterio.open("MOD11_L2.A2023060.2335.061.2023061090119.hdf") as rds:
    for subdataset in rds.subdatasets:
        with rasterio.open(subdataset) as subrds:
            print(subdataset)
            print(subrds.gcps[0][0], subrds.gcps[1])
HDF4_EOS:EOS_SWATH:MOD11_L2.A2023060.2335.061.2023061090119.hdf:MOD_Swath_LST:LST
GroundControlPoint(row=2.5, col=2.5, x=115.89305877685547, y=-62.6209831237793, z=0.0, id='', info='') None
HDF4_EOS:EOS_SWATH:MOD11_L2.A2023060.2335.061.2023061090119.hdf:MOD_Swath_LST:QC
GroundControlPoint(row=2.5, col=2.5, x=115.89305877685547, y=-62.6209831237793, z=0.0, id='', info='') None
HDF4_EOS:EOS_SWATH:MOD11_L2.A2023060.2335.061.2023061090119.hdf:MOD_Swath_LST:Error_LST
GroundControlPoint(row=2.5, col=2.5, x=115.89305877685547, y=-62.6209831237793, z=0.0, id='', info='') None
HDF4_EOS:EOS_SWATH:MOD11_L2.A2023060.2335.061.2023061090119.hdf:MOD_Swath_LST:Emis_31
GroundControlPoint(row=2.5, col=2.5, x=115.89305877685547, y=-62.6209831237793, z=0.0, id='', info='') None
HDF4_EOS:EOS_SWATH:MOD11_L2.A2023060.2335.061.2023061090119.hdf:MOD_Swath_LST:Emis_32
GroundControlPoint(row=2.5, col=2.5, x=115.89305877685547, y=-62.6209831237793, z=0.0, id='', info='') None
HDF4_EOS:EOS_SWATH:MOD11_L2.A2023060.2335.061.2023061090119.hdf:MOD_Swath_LST:View_angle
GroundControlPoint(row=2.5, col=2.5, x=115.89305877685547, y=-62.6209831237793, z=0.0, id='', info='') None
HDF4_EOS:EOS_SWATH:MOD11_L2.A2023060.2335.061.2023061090119.hdf:MOD_Swath_LST:View_time
GroundControlPoint(row=2.5, col=2.5, x=115.89305877685547, y=-62.6209831237793, z=0.0, id='', info='') None
snowman2 commented 1 year ago

Related: https://github.com/corteva/rioxarray/issues/339#issuecomment-853877629

snowman2 commented 1 year ago

This returns nothing for the GCPS:

from osgeo import gdal

ds = gdal.Open("HDF4_EOS:EOS_SWATH:MOD11_L2.A2023060.2335.061.2023061090119.hdf:MOD_Swath_LST:LST")

ds.GetGCPSpatialRef()

I am wondering if GDAL needs to be updated to handle that (https://github.com/OSGeo/gdal)? Or if it is an issue with how the files are generated?

snowman2 commented 1 year ago

@peterm790, with the gist you added, there are no coordinates/transform. So, there isn't much you can do with the data and it can be misleading to have a CRS without the correct coordinates/transform/gcps. I am thinking it would likely be better to either address the issue with the data provider or with GDAL.

However, to allow reading the file, a PR with a change that skips writing the CRS if gcp_crs is None here would be welcome: https://github.com/corteva/rioxarray/blob/33a1be6f1bc3e30cb6d8c1a002d04f439dfb510f/rioxarray/rioxarray.py#L1233-L1235

peterm790 commented 1 year ago

Thank you very much @snowman2! I think I understand the problem better now.

This MOD11_L2 data is version 6.1 which has only recently been released so could well be a source data problem. I will see if the problem persists with older versions. And follow on from there.

I will set up a new branch with suggested PR shortly. Thanks

peterm790 commented 1 year ago

So just to update this issue for anybody that may come here in the future. My understanding of the MOD11_L2 data was a bit flawed, while it does contain ground control points it is not a geo-referenced product and so attempting to re project does not make sense. Instead the level 3 MOD11A1 product should be used. This document was useful to me: https://landweb.modaps.eosdis.nasa.gov/QA_WWW/forPage/user_guide/061/MOD11_C61_UsersGuide_revSudipta_revPete_Final.pdf

Despite this I do think https://github.com/corteva/rioxarray/pull/648 which allows the data and gcps to be opened and viewed makes sense.