GenericMappingTools / pygmt

A Python interface for the Generic Mapping Tools.
https://www.pygmt.org
BSD 3-Clause "New" or "Revised" License
747 stars 216 forks source link

Use rioxarray.open_rasterio in a context manager or not? #3315

Closed seisman closed 2 months ago

seisman commented 2 months ago

We use rioxarray.open_rasterio in a context manager in many places, for example: https://github.com/GenericMappingTools/pygmt/blob/f9a7aca4a5421b36a84204b608c0c8ed9cb468cf/pygmt/tests/test_grdimage_image.py#L18-L22

But it seems the upstream documentation doesn't, for example: https://corteva.github.io/rioxarray/stable/getting_started/getting_started.html#rioxarray

xds = rioxarray.open_rasterio("my.tif")

Why are we using it in a context manager?

seisman commented 2 months ago

If using it in a context manager, mypy will complain:

Item "list[Dataset]" of "Dataset | DataArray | list[Dataset]" has no attribute "__enter__"  [union-attr]

If using it without a context manager, rioxarray keeps the file open, which causes failures on Windows (e.g., https://github.com/GenericMappingTools/pygmt/actions/runs/9826903464/job/27128942703?pr=3115).

weiji14 commented 2 months ago

We're using rioxarray.open_rasterio in a context manager so that the file handler will get closed automatically, at least that's the idea. The auto-closing behaviour is actually a bit more complicated if you look at the code in https://github.com/corteva/rioxarray/blame/0.15.7/rioxarray/_io.py (search for .close), as it depends on the lock parameter which is None by default and set to xarray.backends.locks.SerializableLock, and also the cache parameter which seems to cache most of the data in memory, but there might be weird interactions when multiple threads/locks are in place.

If using it without a context manager, rioxarray keeps the file open, which causes failures on Windows

Commented at https://github.com/GenericMappingTools/pygmt/pull/3115/files#r1667776842. We'll need to be extra careful with file handlers and locks when relying on rioxarray/rasterio/GDAL, I always feel like there's a bug somewhere in that stack when doing multi-threading, but can never point a finger on where it is :frowning:

seisman commented 2 months ago

https://github.com/search?q=repo%3Acorteva%2Frioxarray+%22with+rioxarray.open_rasterio%22&type=code

I think with rioxarray.open_rasterio() as rio is correct, since it's extensively used in the rioxarray tests.

Closing the issue.