corteva / rioxarray

geospatial xarray extension powered by rasterio
https://corteva.github.io/rioxarray
Other
518 stars 82 forks source link

can only read 32 layers from .hdf files before returning a `FileNotFound` error #544

Closed jamie-sgro closed 1 year ago

jamie-sgro commented 2 years ago

Code Sample, a copy-pastable example if possible

I've created a small repo with the necessary code to recreate the below error: https://github.com/jamie-sgro/xarray-recreate-bug

Problem description

In Docker environments only, throws the below error. This only occurs when trying to read .hdf files with a cumulative total of >32 layers. It always fails on the 33rd layer being read into memory regardless of the order of the files and the contents of the files themselves. Note we use a copy of a file for each iteration and it still fails

rasterio.errors.RasterioIOError: HDF4_EOS:EOS_GRID:/tmp/pytest-of-root/
pytest-5/test_can_open_hdf4_closer_to_e0/file3:MODIS_Grid_16DAY_1km_VI:1
km 16 days blue reflectance: No such file or directory

I believe this is an error in the intersection between xarray, rioxarray, and rasterio. See these two other issues for more details:

Full Error ``` Last login: Tue Jul 5 12:28:09 on ttys003 docker exec -it 9763aa865198baad81e9e25fd70580f20cb3d4fb0b83ef64edc2f3fba60c9e92 /bin/sh (base) jamiesgro@Jamies-MacBook-Pro ~ % docker exec -it 9763aa865198baad81e9e25fd70580f20cb3d4fb0b83ef64edc2f3fba60c9e92 /bin/sh # pytest ========================================================================================================================================== test session starts ========================================================================================================================================== platform linux -- Python 3.9.2, pytest-7.1.2, pluggy-1.0.0 rootdir: /app collected 3 items tests/test_rasterio_open.py . [ 33%] tests/test_xarray_open_hdf4.py .F [100%] =============================================================================================================================================== FAILURES ================================================================================================================================================ ____________________________________________________________________________________________________________________________________ test_using_xarray_via_rioxarray ____________________________________________________________________________________________________________________________________ > ??? rasterio/_base.pyx:261: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > ??? rasterio/_shim.pyx:78: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > ??? E rasterio._err.CPLE_OpenFailedError: HDF4_EOS:EOS_GRID:/tmp/pytest-of-root/pytest-0/test_using_xarray_via_rioxarra0/file2:MODIS_Grid_16DAY_1km_VI:1 km 16 days blue reflectance: No such file or directory rasterio/_err.pyx:216: CPLE_OpenFailedError During handling of the above exception, another exception occurred: tmp_path = PosixPath('/tmp/pytest-of-root/pytest-0/test_using_xarray_via_rioxarra0') def test_using_xarray_via_rioxarray(tmp_path: Path): """Same as above but using the rioxaray library to open via rasterio """ num_files = 4 filepaths = [tmp_path / f"file{i}" for i in range(num_files)] for i in range(num_files): shutil.copyfile(FILEPATH, filepaths[i]) for filepath in filepaths: > with xr.open_dataset(filepath, engine="rasterio") as _: tests/test_xarray_open_hdf4.py:57: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ /usr/local/lib/python3.9/site-packages/xarray/backends/api.py:496: in open_dataset backend_ds = backend.open_dataset( /usr/local/lib/python3.9/site-packages/rioxarray/xarray_plugin.py:55: in open_dataset rds = _io.open_rasterio( /usr/local/lib/python3.9/site-packages/rioxarray/_io.py:855: in open_rasterio return _load_subdatasets( /usr/local/lib/python3.9/site-packages/rioxarray/_io.py:619: in _load_subdatasets with rasterio.open(subdataset) as rds: /usr/local/lib/python3.9/site-packages/rasterio/env.py:437: in wrapper return f(*args, **kwds) /usr/local/lib/python3.9/site-packages/rasterio/__init__.py:220: in open s = DatasetReader(path, driver=driver, sharing=sharing, **kwargs) _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > ??? E rasterio.errors.RasterioIOError: HDF4_EOS:EOS_GRID:/tmp/pytest-of-root/pytest-0/test_using_xarray_via_rioxarra0/file2:MODIS_Grid_16DAY_1km_VI:1 km 16 days blue reflectance: No such file or directory rasterio/_base.pyx:263: RasterioIOError ======================================================================================================================================== short test summary info ======================================================================================================================================== FAILED tests/test_xarray_open_hdf4.py::test_using_xarray_via_rioxarray - rasterio.errors.RasterioIOError: HDF4_EOS:EOS_GRID:/tmp/pytest-of-root/pytest-0/test_using_xarray_via_rioxarra0/file2:MODIS_Grid_16DAY_1km_VI:1 km 16 days blue reflectance: No such file or directory ====================================================================================================================================== 1 failed, 2 passed in 9.07s ============================================================================================================= ```

Expected Output

The expected output is that all layers are read into memory (in this case, as an xr.Dataset) with no challenges

Environment Information

Installation method

snowman2 commented 2 years ago

If you have time to find the most recent version of rasterio/xarray/rioxarray where this wasn't an issue, that would be very helpful.

J-Levitt commented 2 years ago

A quick note as referenced in https://github.com/rasterio/rasterio/issues/2490 that looking forward with gdal 3.5.1 and rasterio 1.3.0 the issue persists

ShengpeiWang commented 2 years ago

Inspired by @snowman2's comment here https://github.com/rasterio/rasterio/issues/2490#issuecomment-1164700425. I found that the target files are kept open when reading in the data in rioxarray/_io.py:619: in _load_subdatasets.

When the method was updated to load the data into memory and close the file after, the test passed:


        if subdataset_filter is not None and not subdataset_filter.match(subdataset):
            continue
        with rasterio.open(subdataset) as rds:
            shape = rds.shape
        rioda: DataArray
        with open_rasterio(  # type: ignore
            subdataset,
            parse_coordinates=shape not in dim_groups and parse_coordinates,
            chunks=chunks,
            cache=cache,
            lock=lock,
            masked=masked,
            mask_and_scale=mask_and_scale,
            default_name=subdataset.split(":")[-1].lstrip("/").replace("/", "_"),
            decode_times=decode_times,
            decode_timedelta=decode_timedelta,
            **open_kwargs,
        ) as rioda:
            rioda.load()
        if shape not in dim_groups:
            dim_groups[shape] = {rioda.name: rioda}
        else:
            dim_groups[shape][rioda.name] = rioda```
I'm happy to open a PR to address the issue.
snowman2 commented 2 years ago

We don't always want all of the data loaded into memory as there are scenarios with larger files when you only want to load in a subset of the data. If you wanted to add a rioda.close() after open_rasterio without loading in the data, it should work fine. xarray should re-open the file and load in the data when requested.

snowman2 commented 1 year ago

Running into this in #606. Seems it was fine with GDAL 3.4 and the problem was introduced in GDAL 3.5.

Investigation here: https://github.com/OSGeo/gdal/issues/6665

snowman2 commented 1 year ago

Fix identified in GDAL.

snowman2 commented 1 year ago

607 should help as well.