Closed jamie-sgro closed 1 year ago
If you have time to find the most recent version of rasterio/xarray/rioxarray where this wasn't an issue, that would be very helpful.
A quick note as referenced in https://github.com/rasterio/rasterio/issues/2490 that looking forward with gdal 3.5.1 and rasterio 1.3.0 the issue persists
Inspired by @snowman2's comment here https://github.com/rasterio/rasterio/issues/2490#issuecomment-1164700425. I found that the target files are kept open when reading in the data in rioxarray/_io.py:619: in _load_subdatasets
.
When the method was updated to load the data into memory and close the file after, the test passed:
if subdataset_filter is not None and not subdataset_filter.match(subdataset):
continue
with rasterio.open(subdataset) as rds:
shape = rds.shape
rioda: DataArray
with open_rasterio( # type: ignore
subdataset,
parse_coordinates=shape not in dim_groups and parse_coordinates,
chunks=chunks,
cache=cache,
lock=lock,
masked=masked,
mask_and_scale=mask_and_scale,
default_name=subdataset.split(":")[-1].lstrip("/").replace("/", "_"),
decode_times=decode_times,
decode_timedelta=decode_timedelta,
**open_kwargs,
) as rioda:
rioda.load()
if shape not in dim_groups:
dim_groups[shape] = {rioda.name: rioda}
else:
dim_groups[shape][rioda.name] = rioda```
I'm happy to open a PR to address the issue.
We don't always want all of the data loaded into memory as there are scenarios with larger files when you only want to load in a subset of the data. If you wanted to add a rioda.close()
after open_rasterio
without loading in the data, it should work fine. xarray
should re-open the file and load in the data when requested.
Running into this in #606. Seems it was fine with GDAL 3.4 and the problem was introduced in GDAL 3.5.
Investigation here: https://github.com/OSGeo/gdal/issues/6665
Fix identified in GDAL.
Code Sample, a copy-pastable example if possible
I've created a small repo with the necessary code to recreate the below error: https://github.com/jamie-sgro/xarray-recreate-bug
Problem description
In Docker environments only, throws the below error. This only occurs when trying to read .hdf files with a cumulative total of >32 layers. It always fails on the 33rd layer being read into memory regardless of the order of the files and the contents of the files themselves. Note we use a copy of a file for each iteration and it still fails
I believe this is an error in the intersection between xarray, rioxarray, and rasterio. See these two other issues for more details:
Full Error
``` Last login: Tue Jul 5 12:28:09 on ttys003 docker exec -it 9763aa865198baad81e9e25fd70580f20cb3d4fb0b83ef64edc2f3fba60c9e92 /bin/sh (base) jamiesgro@Jamies-MacBook-Pro ~ % docker exec -it 9763aa865198baad81e9e25fd70580f20cb3d4fb0b83ef64edc2f3fba60c9e92 /bin/sh # pytest ========================================================================================================================================== test session starts ========================================================================================================================================== platform linux -- Python 3.9.2, pytest-7.1.2, pluggy-1.0.0 rootdir: /app collected 3 items tests/test_rasterio_open.py . [ 33%] tests/test_xarray_open_hdf4.py .F [100%] =============================================================================================================================================== FAILURES ================================================================================================================================================ ____________________________________________________________________________________________________________________________________ test_using_xarray_via_rioxarray ____________________________________________________________________________________________________________________________________ > ??? rasterio/_base.pyx:261: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > ??? rasterio/_shim.pyx:78: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > ??? E rasterio._err.CPLE_OpenFailedError: HDF4_EOS:EOS_GRID:/tmp/pytest-of-root/pytest-0/test_using_xarray_via_rioxarra0/file2:MODIS_Grid_16DAY_1km_VI:1 km 16 days blue reflectance: No such file or directory rasterio/_err.pyx:216: CPLE_OpenFailedError During handling of the above exception, another exception occurred: tmp_path = PosixPath('/tmp/pytest-of-root/pytest-0/test_using_xarray_via_rioxarra0') def test_using_xarray_via_rioxarray(tmp_path: Path): """Same as above but using the rioxaray library to open via rasterio """ num_files = 4 filepaths = [tmp_path / f"file{i}" for i in range(num_files)] for i in range(num_files): shutil.copyfile(FILEPATH, filepaths[i]) for filepath in filepaths: > with xr.open_dataset(filepath, engine="rasterio") as _: tests/test_xarray_open_hdf4.py:57: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ /usr/local/lib/python3.9/site-packages/xarray/backends/api.py:496: in open_dataset backend_ds = backend.open_dataset( /usr/local/lib/python3.9/site-packages/rioxarray/xarray_plugin.py:55: in open_dataset rds = _io.open_rasterio( /usr/local/lib/python3.9/site-packages/rioxarray/_io.py:855: in open_rasterio return _load_subdatasets( /usr/local/lib/python3.9/site-packages/rioxarray/_io.py:619: in _load_subdatasets with rasterio.open(subdataset) as rds: /usr/local/lib/python3.9/site-packages/rasterio/env.py:437: in wrapper return f(*args, **kwds) /usr/local/lib/python3.9/site-packages/rasterio/__init__.py:220: in open s = DatasetReader(path, driver=driver, sharing=sharing, **kwargs) _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > ??? E rasterio.errors.RasterioIOError: HDF4_EOS:EOS_GRID:/tmp/pytest-of-root/pytest-0/test_using_xarray_via_rioxarra0/file2:MODIS_Grid_16DAY_1km_VI:1 km 16 days blue reflectance: No such file or directory rasterio/_base.pyx:263: RasterioIOError ======================================================================================================================================== short test summary info ======================================================================================================================================== FAILED tests/test_xarray_open_hdf4.py::test_using_xarray_via_rioxarray - rasterio.errors.RasterioIOError: HDF4_EOS:EOS_GRID:/tmp/pytest-of-root/pytest-0/test_using_xarray_via_rioxarra0/file2:MODIS_Grid_16DAY_1km_VI:1 km 16 days blue reflectance: No such file or directory ====================================================================================================================================== 1 failed, 2 passed in 9.07s ============================================================================================================= ```Expected Output
The expected output is that all layers are read into memory (in this case, as an
xr.Dataset
) with no challengesEnvironment Information
python -c "import rioxarray; rioxarray.show_versions()"
python -c "import rioxarray; print(rioxarray.__version__)"
)rio --version
)rio --gdal-version
)python -c "import sys; print(sys.version.replace('\n', ' '))"
)python -c "import platform; print(platform.platform())"
)Installation method