Closed rsignell closed 2 months ago
We can read data from a kerchunked grib dataset locally but when we try with a remote Dask cluster, we are getting:
ModuleNotFoundError: No module named 'eccodes'
I know we've hit this in the past, but can't remember how we solved it.
The code we are running is:
import datatree hrrr_combined_refs = 's3://esip-qhub-public/rsignell/hrrr_zstore.json' dtree = datatree.open_datatree( fsspec.filesystem('reference', fo=hrrr_combined_refs).get_mapper(''), engine='zarr', consolidated=False, chunks={}) hrrr_t2m = dtree['t2m']['instant']['heightAboveGround']['t2m'] hrrr_ts = hrrr_t2m[0,:1000,500,500].load()
If we run this on a Coiled cluster, it fails with:
--------------------------------------------------------------------------- ModuleNotFoundError Traceback (most recent call last) File <timed exec>:1 File /home/conda/global/16102bfe-1721228674-152-pangeo/lib/python3.11/site-packages/xarray/core/dataarray.py:1147, in DataArray.load(self, **kwargs) 1127 def load(self, **kwargs) -> Self: 1128 """Manually trigger loading of this array's data from disk or a 1129 remote source into memory and return this array. 1130 (...) 1145 dask.compute 1146 """ -> 1147 ds = self._to_temp_dataset().load(**kwargs) 1148 new = self._from_temp_dataset(ds) 1149 self._variable = new._variable File /home/conda/global/16102bfe-1721228674-152-pangeo/lib/python3.11/site-packages/xarray/core/dataset.py:863, in Dataset.load(self, **kwargs) 860 chunkmanager = get_chunked_array_type(*lazy_data.values()) 862 # evaluate all the chunked arrays simultaneously --> 863 evaluated_data: tuple[np.ndarray[Any, Any], ...] = chunkmanager.compute( 864 *lazy_data.values(), **kwargs 865 ) 867 for k, data in zip(lazy_data, evaluated_data): 868 self.variables[k].data = data File /home/conda/global/16102bfe-1721228674-152-pangeo/lib/python3.11/site-packages/xarray/namedarray/daskmanager.py:86, in DaskManager.compute(self, *data, **kwargs) 81 def compute( 82 self, *data: Any, **kwargs: Any 83 ) -> tuple[np.ndarray[Any, _DType_co], ...]: 84 from dask.array import compute ---> 86 return compute(*data, **kwargs) File /home/conda/global/16102bfe-1721228674-152-pangeo/lib/python3.11/site-packages/dask/base.py:662, in compute(traverse, optimize_graph, scheduler, get, *args, **kwargs) 659 postcomputes.append(x.__dask_postcompute__()) 661 with shorten_traceback(): --> 662 results = schedule(dsk, keys, **kwargs) 664 return repack([f(r, *a) for r, (f, a) in zip(results, postcomputes)]) File /opt/coiled/env/lib/python3.11/site-packages/xarray/core/indexing.py:573, in __array__() File /opt/coiled/env/lib/python3.11/site-packages/xarray/core/indexing.py:576, in get_duck_array() File /opt/coiled/env/lib/python3.11/site-packages/xarray/core/indexing.py:787, in get_duck_array() File /opt/coiled/env/lib/python3.11/site-packages/xarray/core/indexing.py:650, in get_duck_array() File /opt/coiled/env/lib/python3.11/site-packages/xarray/backends/zarr.py:104, in __getitem__() File /opt/coiled/env/lib/python3.11/site-packages/xarray/core/indexing.py:1014, in explicit_indexing_adapter() File /opt/coiled/env/lib/python3.11/site-packages/xarray/backends/zarr.py:94, in _getitem() File /opt/coiled/env/lib/python3.11/site-packages/zarr/core.py:798, in __getitem__() File /opt/coiled/env/lib/python3.11/site-packages/zarr/core.py:1080, in get_orthogonal_selection() File /opt/coiled/env/lib/python3.11/site-packages/zarr/core.py:1343, in _get_selection() File /opt/coiled/env/lib/python3.11/site-packages/zarr/core.py:2183, in _chunk_getitems() File /opt/coiled/env/lib/python3.11/site-packages/zarr/core.py:2096, in _process_chunk() File /opt/coiled/env/lib/python3.11/site-packages/zarr/core.py:2359, in _decode_chunk() File /opt/coiled/env/lib/python3.11/site-packages/kerchunk/codecs.py:90, in decode() ModuleNotFoundError: No module named 'eccodes'
The conda environment on the Coiled cluster was created with this environment.yaml:
channels: - conda-forge dependencies: - python=3.11 - adlfs - cf_xarray - cfunits - coiled - curl - dask>=2024.7.0 - eccodes - fastparquet - fsspec>=2024.2.0 - gdal - gcsfs - h5netcdf - h5py - intake>=2.0 - intake-stac - intake-xarray - intake-parquet - ipywidgets - ipykernel - kerchunk - metpy - netcdf4 - numba - numcodecs - numpy<2 - pandas - planetary-computer - pyepsg - pystac - pystac-client - python-snappy - rechunker - s3fs - ujson - vim - zstandard - xarray-datatree - zarr
I think you probably need python-eccodes . Locally you probably have cfgrib, which depends on the right set of things.
Yep, that was it! Thanks @martindurant !
We can read data from a kerchunked grib dataset locally but when we try with a remote Dask cluster, we are getting:
I know we've hit this in the past, but can't remember how we solved it.
The code we are running is:
If we run this on a Coiled cluster, it fails with:
The conda environment on the Coiled cluster was created with this environment.yaml: