fsspec / kerchunk

Cloud-friendly access to archival data
https://fsspec.github.io/kerchunk/
MIT License
305 stars 78 forks source link

No module named 'eccodes' when trying to read kerchunked grib dataset with remote Dask cluster #480

Closed rsignell closed 2 months ago

rsignell commented 2 months ago

We can read data from a kerchunked grib dataset locally but when we try with a remote Dask cluster, we are getting:

ModuleNotFoundError: No module named 'eccodes'

I know we've hit this in the past, but can't remember how we solved it.

The code we are running is:

import datatree

hrrr_combined_refs = 's3://esip-qhub-public/rsignell/hrrr_zstore.json'

dtree = datatree.open_datatree(
    fsspec.filesystem('reference', fo=hrrr_combined_refs).get_mapper(''),
    engine='zarr',
    consolidated=False,
    chunks={})

hrrr_t2m = dtree['t2m']['instant']['heightAboveGround']['t2m']

hrrr_ts = hrrr_t2m[0,:1000,500,500].load()

If we run this on a Coiled cluster, it fails with:

---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
File <timed exec>:1

File /home/conda/global/16102bfe-1721228674-152-pangeo/lib/python3.11/site-packages/xarray/core/dataarray.py:1147, in DataArray.load(self, **kwargs)
   1127 def load(self, **kwargs) -> Self:
   1128     """Manually trigger loading of this array's data from disk or a
   1129     remote source into memory and return this array.
   1130 
   (...)
   1145     dask.compute
   1146     """
-> 1147     ds = self._to_temp_dataset().load(**kwargs)
   1148     new = self._from_temp_dataset(ds)
   1149     self._variable = new._variable

File /home/conda/global/16102bfe-1721228674-152-pangeo/lib/python3.11/site-packages/xarray/core/dataset.py:863, in Dataset.load(self, **kwargs)
    860 chunkmanager = get_chunked_array_type(*lazy_data.values())
    862 # evaluate all the chunked arrays simultaneously
--> 863 evaluated_data: tuple[np.ndarray[Any, Any], ...] = chunkmanager.compute(
    864     *lazy_data.values(), **kwargs
    865 )
    867 for k, data in zip(lazy_data, evaluated_data):
    868     self.variables[k].data = data

File /home/conda/global/16102bfe-1721228674-152-pangeo/lib/python3.11/site-packages/xarray/namedarray/daskmanager.py:86, in DaskManager.compute(self, *data, **kwargs)
     81 def compute(
     82     self, *data: Any, **kwargs: Any
     83 ) -> tuple[np.ndarray[Any, _DType_co], ...]:
     84     from dask.array import compute
---> 86     return compute(*data, **kwargs)

File /home/conda/global/16102bfe-1721228674-152-pangeo/lib/python3.11/site-packages/dask/base.py:662, in compute(traverse, optimize_graph, scheduler, get, *args, **kwargs)
    659     postcomputes.append(x.__dask_postcompute__())
    661 with shorten_traceback():
--> 662     results = schedule(dsk, keys, **kwargs)
    664 return repack([f(r, *a) for r, (f, a) in zip(results, postcomputes)])

File /opt/coiled/env/lib/python3.11/site-packages/xarray/core/indexing.py:573, in __array__()

File /opt/coiled/env/lib/python3.11/site-packages/xarray/core/indexing.py:576, in get_duck_array()

File /opt/coiled/env/lib/python3.11/site-packages/xarray/core/indexing.py:787, in get_duck_array()

File /opt/coiled/env/lib/python3.11/site-packages/xarray/core/indexing.py:650, in get_duck_array()

File /opt/coiled/env/lib/python3.11/site-packages/xarray/backends/zarr.py:104, in __getitem__()

File /opt/coiled/env/lib/python3.11/site-packages/xarray/core/indexing.py:1014, in explicit_indexing_adapter()

File /opt/coiled/env/lib/python3.11/site-packages/xarray/backends/zarr.py:94, in _getitem()

File /opt/coiled/env/lib/python3.11/site-packages/zarr/core.py:798, in __getitem__()

File /opt/coiled/env/lib/python3.11/site-packages/zarr/core.py:1080, in get_orthogonal_selection()

File /opt/coiled/env/lib/python3.11/site-packages/zarr/core.py:1343, in _get_selection()

File /opt/coiled/env/lib/python3.11/site-packages/zarr/core.py:2183, in _chunk_getitems()

File /opt/coiled/env/lib/python3.11/site-packages/zarr/core.py:2096, in _process_chunk()

File /opt/coiled/env/lib/python3.11/site-packages/zarr/core.py:2359, in _decode_chunk()

File /opt/coiled/env/lib/python3.11/site-packages/kerchunk/codecs.py:90, in decode()

ModuleNotFoundError: No module named 'eccodes'

The conda environment on the Coiled cluster was created with this environment.yaml:

channels:
  - conda-forge
dependencies:
  - python=3.11
  - adlfs
  - cf_xarray
  - cfunits
  - coiled
  - curl
  - dask>=2024.7.0
  - eccodes
  - fastparquet
  - fsspec>=2024.2.0
  - gdal
  - gcsfs
  - h5netcdf
  - h5py
  - intake>=2.0
  - intake-stac
  - intake-xarray
  - intake-parquet
  - ipywidgets
  - ipykernel
  - kerchunk
  - metpy
  - netcdf4
  - numba
  - numcodecs
  - numpy<2
  - pandas
  - planetary-computer
  - pyepsg
  - pystac
  - pystac-client
  - python-snappy
  - rechunker
  - s3fs
  - ujson
  - vim
  - zstandard
  - xarray-datatree
  - zarr
martindurant commented 2 months ago

I think you probably need python-eccodes . Locally you probably have cfgrib, which depends on the right set of things.

rsignell commented 2 months ago

Yep, that was it! Thanks @martindurant !