corteva / rioxarray

geospatial xarray extension powered by rasterio
https://corteva.github.io/rioxarray
Other
504 stars 80 forks source link

TypeError: cannot pickle '_io.BufferedReader' object when trying to modify an xarray.DataArray opened with fsspec's filecache #711

Open abarciauskas-bgse opened 8 months ago

abarciauskas-bgse commented 8 months ago

πŸ‘‹πŸ½ Hoping someone on the team can help us figure out how to use fsspec filecache with netcdf data when we need to modify the xarray data array object with rioxarray. Right now, it is impossible to do so as we are getting the _io.BufferedReader and the traceback led us to believe this has to do with the deep copy operation taking place https://github.com/corteva/rioxarray/blob/master/rioxarray/rioxarray.py#L1102 and https://github.com/corteva/rioxarray/blob/c15b86061feff8c2c7b0964f19922a3154a85f1a/rioxarray/rioxarray.py#L335

Code Sample

import fsspec
from morecantile import Tile
from rio_tiler.constants import WEB_MERCATOR_TMS
import numpy as np
import xarray as xr
import shutil
import pandas as pd

tms = WEB_MERCATOR_TMS
tile_bounds = tms.xy_bounds(Tile(x=0, y=0, z=0))
dst_crs = tms.rasterio_crs

protocol = 's3'
file_url = 's3://chunk-tests/3B42_Daily.19980101.7.nc4'
cache_storage_dir = 'fsspec-cache'

cache_options = ['filecache', 'blockcache']
inplace_options = [True, False]
# We can add `True` to this list, but `True` always returns AttributeError: __enter__ 
lock_options = [False]

xr_args = {
    'engine': 'h5netcdf'
}

def rio_clip_box(da):
    try:
        crs = da.rio.crs or "epsg:4326"
        da.rio.write_crs(crs, inplace=True)   
        # also with no data     
        da = da.rio.clip_box(*tile_bounds, crs=dst_crs)
    except Exception as e:
        return f"❌ {type(e).__name__}: {e}".replace('\n', ' ')
    return 'βœ…'

def rio_write_nodata(da, inplace: bool = True):
    try:
        da.rio.write_nodata(np.nan, inplace=inplace)
    except Exception as e:
        return f"❌ {type(e).__name__}: {e}".replace('\n', ' ')
    return 'βœ…'

columns = ('cache_option', 'inplace_option', 'lock_option', 'clip_box', 'write_nodata')
results = []
for cache_option in cache_options:
    for inplace_option in inplace_options:
        for lock_option in lock_options:
            params = (cache_option, inplace_option, lock_option)
            filecache_fs = fsspec.filesystem(cache_option, target_protocol=protocol, cache_storage=cache_storage_dir)
            file_opener = filecache_fs.open(file_url, mode='rb')
            xr_args['lock'] = lock_option
            try:
                ds = xr.open_dataset(file_opener, **xr_args)
            except Exception as e:
                results.append(params + (f"❌ {type(e).__name__}: {e}".replace('\n', ' '), f"❌ {type(e).__name__}: {e}".replace('\n', ' ')))
                continue
            da = ds['precipitation']
            da = da.rename({'lon': 'x', 'lat': 'y'})
            da = da.transpose("time", "y", "x", missing_dims="ignore")
            rio_write_nodata_result = rio_write_nodata(da, inplace=inplace_option)
            clip_box_result = rio_clip_box(da)
            results.append(params + (clip_box_result, rio_write_nodata_result))
            shutil.rmtree(cache_storage_dir)

df = pd.DataFrame(data=results, columns=columns)
df.to_markdown("results.md", index=False, tablefmt="github")
cache_option inplace_option lock_option clip_box write_nodata
filecache True False ❌ TypeError: cannot pickle '_io.BufferedReader' object βœ…
filecache False False ❌ TypeError: cannot pickle '_io.BufferedReader' object ❌ TypeError: cannot pickle '_io.BufferedReader' object
blockcache True False βœ… βœ…
blockcache False False βœ… βœ…

Problem description

It is not possible to make rioxarray operations on an xarray.DataArray that is stored in fsspec's filecache

Expected Output

Modified xarray.DataArray

Environment Information

python -c "import rioxarray; rioxarray.show_versions()"

returns

rioxarray (0.15.0) deps:
  rasterio: 1.3.8
    xarray: 2023.10.0
      GDAL: 3.6.4
      GEOS: 0.0.0
      PROJ: 9.0.1
 PROJ DATA: /Users/aimeebarciauskas/mambaforge/share/proj
 GDAL DATA: /Users/aimeebarciauskas/mambaforge/share/gdal

Other python deps:
     scipy: None
    pyproj: 3.6.0

System:
    python: 3.10.12 | packaged by conda-forge | (main, Jun 23 2023, 22:39:40) [Clang 15.0.7 ]
executable: /Users/aimeebarciauskas/mambaforge/bin/python
   machine: macOS-10.15.7-x86_64-i386-64bit
python -c "import fsspec; print(fsspec.__version__)"

returns

2023.9.0

Installation method

pip

abarciauskas-bgse commented 8 months ago

It might be worth noting that if you don't remove the cache after each run of the 2 functions you get all instances of ❌ TypeError: cannot pickle '_io.BufferedReader' object for clip_box and for write_nodata when inplace=False. So rioxarray is not able to work with fsspec's blockcache for files either.

snowman2 commented 7 months ago

Related #614. Possible duplicate.