google-research / weatherbench2

A benchmark for the next generation of data-driven global weather models.
https://weatherbench2.readthedocs.io
Apache License 2.0
368 stars 33 forks source link

xbeam.open_zarr()` fails to open stores that `xarray.open_zarr()` handles fine with chunking #135

Open jdwillard19 opened 3 months ago

jdwillard19 commented 3 months ago

Issue: xbeam.open_zarr() fails to open stores that xarray.open_zarr() handles fine

Description

I am encountering an issue where xbeam.open_zarr() is unable to open Zarr stores that are seamlessly opened by xarray.open_zarr() when chunking is incorporated.

Dataset view

Screenshot from 2024-04-01 11-41-42

Steps to Reproduce

  1. Create a Zarr store that xarray.open_zarr() can open without issues that incorporates chunking.
  2. Attempt to open the same Zarr store using xbeam.open_zarr().

Code Sample

import xarray as xr
import xarray_beam as xbeam

# Path to the Zarr store
zarr721_path = "path/to/zarr/store"

# Opening with xarray
ds_xarray = xr.open_zarr(zarr721_path)

# Trying to open the same store with xarray_beam
try:
    ds_xbeam = xbeam.open_zarr(zarr721_path)

n_zarr(zarr721_path)
except Exception as e:
    print(f"Failed to open with xarray_beam: {e}")

Output

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[11], line 8
      1 #format metadata?
      2 # ds721_zarr
      3 # # print(ds721_zarr)
   (...)
      6 # # import xarray_beam as xbeam
      7 # chunks = {'time': 1, 'prediction_timedelta': 1}
----> 8 xb_zarr = xbeam.open_zarr(zarr721_path)
      9 xb_zarr

File /global/cfs/cdirs/m4416/jared/fcn_mip-env_e2mip_update/lib/python3.10/site-packages/xarray_beam/_src/zarr.py:92, in open_zarr(store, **kwargs)
     88   raise TypeError(
     89       'xarray_beam.open_zarr does not support the `chunks` argument'
     90   )
     91 dataset = xarray.open_zarr(store, **kwargs, chunks=None)
---> 92 chunks = _infer_chunks(dataset)
     93 return dataset, chunks

File /global/cfs/cdirs/m4416/jared/fcn_mip-env_e2mip_update/lib/python3.10/site-packages/xarray_beam/_src/zarr.py:61, in _infer_chunks(dataset)
     59 for dim, sizes in chunks_sets.items():
     60   if len(sizes) > 1:
---> 61     raise ValueError(
     62         f'inconsistent chunk sizes on Zarr dataset for dimension {dim!r}: '
     63         f'{sizes}'
     64     )
     65   (chunks[dim],) = sizes
     66 return chunks

ValueError: inconsistent chunk sizes on Zarr dataset for dimension 'prediction_timedelta': {1, 42}

Environment Info

Name Version Build Channel

_libgcc_mutex 0.1 conda_forge conda-forge _openmp_mutex 4.5 2_gnu conda-forge absl-py 2.1.0 pypi_0 pypi aiobotocore 2.12.0 pypi_0 pypi aiohttp 3.8.5 pypi_0 pypi aioitertools 0.11.0 pypi_0 pypi aiosignal 1.3.1 pypi_0 pypi alabaster 0.7.13 pypi_0 pypi alembic 1.11.2 pypi_0 pypi altair 5.2.0 pypi_0 pypi annotated-types 0.5.0 pypi_0 pypi antlr4-python3-runtime 4.9.3 pypi_0 pypi apache-beam 2.54.0 pypi_0 pypi apex 0.1 pypi_0 pypi appdirs 1.4.4 pypi_0 pypi asciitree 0.3.3 pypi_0 pypi asttokens 2.2.1 pypi_0 pypi astunparse 1.6.3 pypi_0 pypi async-timeout 4.0.3 pypi_0 pypi attrs 23.1.0 pypi_0 pypi autopep8 2.0.2 pypi_0 pypi babel 2.12.1 pypi_0 pypi backcall 0.2.0 pypi_0 pypi beautifulsoup4 4.12.3 pypi_0 pypi benchy 0.1 pypi_0 pypi bleach 6.1.0 pypi_0 pypi blinker 1.6.2 pypi_0 pypi bokeh 3.3.4 pypi_0 pypi boto3 1.34.51 pypi_0 pypi botocore 1.34.51 pypi_0 pypi bottleneck 1.3.7 pypi_0 pypi bzip2 1.0.8 h7f98852_4 conda-forge c-ares 1.25.0 hd590300_0 conda-forge ca-certificates 2023.11.17 hbcca054_0 conda-forge cached-property 1.5.2 hd8ed1ab_1 conda-forge cached_property 1.5.2 pyha770c72_1 conda-forge cachetools 5.3.3 pypi_0 pypi cartopy 0.22.0 pypi_0 pypi cdsapi 0.6.1 pypi_0 pypi certifi 2023.7.22 pypi_0 pypi cffi 1.15.1 pypi_0 pypi cfgrib 0.9.10.4 pypi_0 pypi cfgv 3.4.0 pypi_0 pypi cftime 1.6.2 pypi_0 pypi charset-normalizer 3.2.0 pypi_0 pypi click 8.1.6 pypi_0 pypi cloudpickle 2.2.1 pypi_0 pypi cmake 3.27.2 pypi_0 pypi coloredlogs 15.0.1 pypi_0 pypi comm 0.1.4 pypi_0 pypi contourpy 1.1.0 pypi_0 pypi coverage 7.3.0 pypi_0 pypi crcmod 1.7 pypi_0 pypi cuda-cudart 12.2.140 hd3aeb46_0 conda-forge cuda-cudart_linux-64 12.2.140 h59595ed_0 conda-forge cuda-nvrtc 12.2.140 hd3aeb46_0 conda-forge cuda-nvtx 12.2.140 h59595ed_0 conda-forge cuda-version 12.2 he2b69de_2 conda-forge cupy 12.3.0 py310hfc31588_2 conda-forge cycler 0.11.0 pypi_0 pypi dask 2023.8.0 pypi_0 pypi databricks-cli 0.17.7 pypi_0 pypi debugpy 1.6.7.post1 pypi_0 pypi decorator 4.4.2 pypi_0 pypi defusedxml 0.7.1 pypi_0 pypi dgl 1.1.1+cu117 pypi_0 pypi dglgo 0.0.2 pypi_0 pypi dill 0.3.1.1 pypi_0 pypi distlib 0.3.7 pypi_0 pypi distributed 2023.8.0 pypi_0 pypi dm-tree 0.1.8 pypi_0 pypi dnspython 2.6.1 pypi_0 pypi docker 6.1.3 pypi_0 pypi docker-pycreds 0.4.0 pypi_0 pypi docopt 0.6.2 pypi_0 pypi docutils 0.20.1 pypi_0 pypi earth2mip 0.2.0a0 pypi_0 pypi eccodes 1.6.0 pypi_0 pypi ecmwf-opendata 0.2.0 pypi_0 pypi ecmwflibs 0.5.3 pypi_0 pypi einops 0.6.1 pypi_0 pypi entrypoints 0.4 pypi_0 pypi exceptiongroup 1.1.2 pypi_0 pypi executing 1.2.0 pypi_0 pypi facets-overview 1.1.1 pypi_0 pypi fastavro 1.9.4 pypi_0 pypi fasteners 0.18 pypi_0 pypi fastjsonschema 2.19.1 pypi_0 pypi fastrlock 0.8.2 py310hc6cd4ac_2 conda-forge filelock 3.12.2 pypi_0 pypi findlibs 0.0.5 pypi_0 pypi flask 2.3.2 pypi_0 pypi flatbuffers 23.5.26 pypi_0 pypi flox 0.7.2 pypi_0 pypi fonttools 4.42.0 pypi_0 pypi frozenlist 1.4.0 pypi_0 pypi fsspec 2024.2.0 pypi_0 pypi gast 0.5.4 pypi_0 pypi gcsfs 2024.2.0 pypi_0 pypi gitdb 4.0.10 pypi_0 pypi gitpython 3.1.32 pypi_0 pypi google-api-core 2.17.1 pypi_0 pypi google-auth 2.28.2 pypi_0 pypi google-auth-oauthlib 1.2.0 pypi_0 pypi google-cloud-core 2.4.1 pypi_0 pypi google-cloud-dataproc 5.9.3 pypi_0 pypi google-cloud-storage 2.15.0 pypi_0 pypi google-crc32c 1.5.0 pypi_0 pypi google-resumable-media 2.7.0 pypi_0 pypi googleapis-common-protos 1.63.0 pypi_0 pypi greenlet 2.0.2 pypi_0 pypi grpc-google-iam-v1 0.13.0 pypi_0 pypi grpcio 1.62.1 pypi_0 pypi grpcio-status 1.62.1 pypi_0 pypi gunicorn 20.1.0 pypi_0 pypi h5netcdf 1.3.0 pyhd8ed1ab_0 conda-forge h5py 3.9.0 pypi_0 pypi hdf5 1.14.3 nompi_h4f84152_100 conda-forge hdfs 2.7.3 pypi_0 pypi httplib2 0.22.0 pypi_0 pypi huggingface-hub 0.16.4 pypi_0 pypi humanfriendly 10.0 pypi_0 pypi hydra-core 1.3.2 pypi_0 pypi identify 2.5.26 pypi_0 pypi idna 3.4 pypi_0 pypi imageio 2.31.1 pypi_0 pypi imageio-ffmpeg 0.4.8 pypi_0 pypi imagesize 1.4.1 pypi_0 pypi immutabledict 4.2.0 pypi_0 pypi importlib-metadata 6.8.0 pypi_0 pypi iniconfig 2.0.0 pypi_0 pypi ipykernel 6.25.1 pypi_0 pypi ipython 8.14.0 pypi_0 pypi ipywidgets 8.1.2 pypi_0 pypi isort 5.12.0 pypi_0 pypi itsdangerous 2.1.2 pypi_0 pypi jax 0.4.25 pypi_0 pypi jaxlib 0.4.25 pypi_0 pypi jedi 0.19.0 pypi_0 pypi jinja2 3.1.2 pypi_0 pypi jmespath 1.0.1 pypi_0 pypi joblib 1.3.2 pypi_0 pypi js2py 0.74 pypi_0 pypi jsonpickle 3.0.3 pypi_0 pypi jsonschema 4.21.1 pypi_0 pypi jsonschema-specifications 2023.12.1 pypi_0 pypi jupyter-client 8.2.0 pypi_0 pypi jupyter-core 5.3.1 pypi_0 pypi jupyterlab-pygments 0.3.0 pypi_0 pypi jupyterlab-widgets 3.0.10 pypi_0 pypi keyutils 1.6.1 h166bdaf_0 conda-forge kiwisolver 1.4.4 pypi_0 pypi krb5 1.21.2 h659d440_0 conda-forge ld_impl_linux-64 2.40 h41732ed_0 conda-forge libaec 1.1.2 h59595ed_1 conda-forge libblas 3.9.0 20_linux64_openblas conda-forge libcblas 3.9.0 20_linux64_openblas conda-forge libcublas 12.2.5.6 hd3aeb46_0 conda-forge libcufft 11.0.8.103 hd3aeb46_0 conda-forge libcurand 10.3.3.141 hd3aeb46_0 conda-forge libcurl 8.5.0 hca28451_0 conda-forge libcusolver 11.5.2.141 hd3aeb46_0 conda-forge libcusparse 12.1.2.141 hd3aeb46_0 conda-forge libedit 3.1.20191231 he28a2e2_2 conda-forge libev 4.33 hd590300_2 conda-forge libffi 3.4.2 h7f98852_5 conda-forge libgcc-ng 13.1.0 he5830b7_0 conda-forge libgfortran-ng 13.2.0 h69a702a_0 conda-forge libgfortran5 13.2.0 ha4646dd_0 conda-forge libgomp 13.1.0 he5830b7_0 conda-forge liblapack 3.9.0 20_linux64_openblas conda-forge libnghttp2 1.58.0 h47da74e_1 conda-forge libnsl 2.0.0 h7f98852_0 conda-forge libnvjitlink 12.2.140 hd3aeb46_0 conda-forge libopenblas 0.3.25 pthreads_h413a1c8_0 conda-forge libsqlite 3.42.0 h2797004_0 conda-forge libssh2 1.11.0 h0841786_0 conda-forge libstdcxx-ng 13.2.0 h7e041cc_3 conda-forge libuuid 2.38.1 h0b41bf4_0 conda-forge libzlib 1.2.13 hd590300_5 conda-forge lightning-utilities 0.9.0 pypi_0 pypi lit 16.0.6 pypi_0 pypi littleutils 0.2.2 pypi_0 pypi llvmlite 0.40.1 pypi_0 pypi locket 1.0.0 pypi_0 pypi loguru 0.7.2 pypi_0 pypi mako 1.2.4 pypi_0 pypi markdown 3.4.4 pypi_0 pypi markupsafe 2.1.3 pypi_0 pypi matplotlib 3.7.2 pypi_0 pypi matplotlib-inline 0.1.6 pypi_0 pypi metpy 1.5.1 pypi_0 pypi mistune 3.0.2 pypi_0 pypi ml-dtypes 0.3.2 pypi_0 pypi mlflow 2.5.0 pypi_0 pypi moviepy 1.0.3 pypi_0 pypi mpi4py 3.1.4 pypi_0 pypi mpmath 1.3.0 pypi_0 pypi msgpack 1.0.5 pypi_0 pypi multidict 6.0.4 pypi_0 pypi multiurl 0.2.3.2 pypi_0 pypi mypy-extensions 1.0.0 pypi_0 pypi nbclient 0.10.0 pypi_0 pypi nbconvert 7.16.2 pypi_0 pypi nbformat 5.10.2 pypi_0 pypi ncurses 6.4 hcb278e6_0 conda-forge nest-asyncio 1.5.7 pypi_0 pypi netcdf4 1.6.4 pypi_0 pypi networkx 3.1 pypi_0 pypi nodeenv 1.8.0 pypi_0 pypi numba 0.57.1 pypi_0 pypi numcodecs 0.11.0 pypi_0 pypi numpy 1.24.4 pypi_0 pypi numpy-groupies 0.9.22 pypi_0 pypi numpydoc 1.5.0 pypi_0 pypi nvidia-cublas-cu11 11.10.3.66 pypi_0 pypi nvidia-cuda-cupti-cu11 11.7.101 pypi_0 pypi nvidia-cuda-nvrtc-cu11 11.7.99 pypi_0 pypi nvidia-cuda-runtime-cu11 11.7.99 pypi_0 pypi nvidia-cudnn-cu11 8.5.0.96 pypi_0 pypi nvidia-cufft-cu11 10.9.0.58 pypi_0 pypi nvidia-curand-cu11 10.2.10.91 pypi_0 pypi nvidia-cusolver-cu11 11.4.0.1 pypi_0 pypi nvidia-cusparse-cu11 11.7.4.91 pypi_0 pypi nvidia-dali-cuda110 1.28.0 pypi_0 pypi nvidia-modulus 0.5.0 pypi_0 pypi nvidia-nccl-cu11 2.14.3 pypi_0 pypi nvidia-nvtx-cu11 11.7.91 pypi_0 pypi nvtx 0.2.10 pypi_0 pypi oauthlib 3.2.2 pypi_0 pypi objsize 0.7.0 pypi_0 pypi ogb 1.3.6 pypi_0 pypi omegaconf 2.3.0 pypi_0 pypi onnxruntime-gpu 1.16.3 pypi_0 pypi openssl 3.2.0 hd590300_1 conda-forge opt-einsum 3.3.0 pypi_0 pypi orjson 3.9.15 pypi_0 pypi outdated 0.2.2 pypi_0 pypi packaging 23.1 pypi_0 pypi pandas 2.0.3 pypi_0 pypi pandocfilters 1.5.1 pypi_0 pypi parso 0.8.3 pypi_0 pypi partd 1.4.0 pypi_0 pypi pathtools 0.1.2 pypi_0 pypi pexpect 4.8.0 pypi_0 pypi pickleshare 0.7.5 pypi_0 pypi pillow 10.0.0 pypi_0 pypi pint 0.22 pypi_0 pypi pip 23.2.1 pyhd8ed1ab_0 conda-forge platformdirs 3.10.0 pypi_0 pypi pluggy 1.2.0 pypi_0 pypi pooch 1.7.0 pypi_0 pypi pre-commit 3.3.3 pypi_0 pypi proglog 0.1.10 pypi_0 pypi prompt-toolkit 3.0.39 pypi_0 pypi properscoring 0.1 pypi_0 pypi proto-plus 1.23.0 pypi_0 pypi protobuf 4.25.3 pypi_0 pypi psutil 5.9.5 pypi_0 pypi ptyprocess 0.7.0 pypi_0 pypi pure-eval 0.2.2 pypi_0 pypi py 1.11.0 pypi_0 pypi pyarrow 12.0.1 pypi_0 pypi pyarrow-hotfix 0.6 pypi_0 pypi pyasn1 0.5.1 pypi_0 pypi pyasn1-modules 0.3.0 pypi_0 pypi pycodestyle 2.11.0 pypi_0 pypi pycparser 2.21 pypi_0 pypi pydantic 1.10.12 pypi_0 pypi pydantic-core 2.4.0 pypi_0 pypi pydot 1.4.2 pypi_0 pypi pygments 2.16.1 pypi_0 pypi pyjsparser 2.7.1 pypi_0 pypi pyjwt 2.8.0 pypi_0 pypi pymongo 4.6.2 pypi_0 pypi pyparsing 3.0.9 pypi_0 pypi pyproj 3.6.0 pypi_0 pypi pyshp 2.3.1 pypi_0 pypi pytest 7.4.0 pypi_0 pypi pytest-asyncio 0.23.3 pypi_0 pypi pytest-regtest 2.1.0 pypi_0 pypi pytest-timeout 2.2.0 pypi_0 pypi python 3.10.6 ha86cf86_0_cpython conda-forge python-dateutil 2.8.2 pypi_0 pypi python-dotenv 1.0.1 pypi_0 pypi python_abi 3.10 4_cp310 conda-forge pytz 2023.3 pypi_0 pypi pyyaml 6.0.1 pypi_0 pypi pyzmq 25.1.1 pypi_0 pypi querystring-parser 1.2.4 pypi_0 pypi rdkit-pypi 2022.9.5 pypi_0 pypi readline 8.2 h8228510_1 conda-forge rechunker 0.5.2 pypi_0 pypi referencing 0.32.1 pypi_0 pypi regex 2023.12.25 pypi_0 pypi requests 2.31.0 pypi_0 pypi requests-oauthlib 1.4.0 pypi_0 pypi rpds-py 0.17.1 pypi_0 pypi rsa 4.9 pypi_0 pypi ruamel-yaml 0.17.32 pypi_0 pypi ruamel-yaml-clib 0.2.7 pypi_0 pypi s3fs 2024.2.0 pypi_0 pypi s3transfer 0.10.0 pypi_0 pypi safetensors 0.3.3 pypi_0 pypi scikit-learn 1.3.0 pypi_0 pypi scipy 1.11.1 pypi_0 pypi sentry-sdk 1.29.2 pypi_0 pypi setproctitle 1.3.2 pypi_0 pypi setuptools 68.0.0 pyhd8ed1ab_0 conda-forge shapely 2.0.1 pypi_0 pypi six 1.16.0 pypi_0 pypi smmap 5.0.0 pypi_0 pypi snowballstemmer 2.2.0 pypi_0 pypi sortedcontainers 2.4.0 pypi_0 pypi soupsieve 2.5 pypi_0 pypi sphinx 7.1.2 pypi_0 pypi sphinxcontrib-applehelp 1.0.6 pypi_0 pypi sphinxcontrib-devhelp 1.0.4 pypi_0 pypi sphinxcontrib-htmlhelp 2.0.3 pypi_0 pypi sphinxcontrib-jsmath 1.0.1 pypi_0 pypi sphinxcontrib-qthelp 1.0.5 pypi_0 pypi sphinxcontrib-serializinghtml 1.1.7 pypi_0 pypi sqlalchemy 2.0.19 pypi_0 pypi sqlparse 0.4.4 pypi_0 pypi stack-data 0.6.2 pypi_0 pypi sympy 1.12 pypi_0 pypi tabulate 0.9.0 pypi_0 pypi tblib 2.0.0 pypi_0 pypi termcolor 2.3.0 pypi_0 pypi threadpoolctl 3.2.0 pypi_0 pypi timeloop 1.0.2 pypi_0 pypi timm 0.9.7 pypi_0 pypi tinycss2 1.2.1 pypi_0 pypi tk 8.6.12 h27826a3_0 conda-forge tomli 2.0.1 pypi_0 pypi toolz 0.12.0 pypi_0 pypi torch 2.0.1 pypi_0 pypi torch-harmonics 0.6.3 pypi_0 pypi torchmetrics 1.1.1 pypi_0 pypi torchvision 0.15.2 pypi_0 pypi tornado 6.3.3 pypi_0 pypi tqdm 4.66.1 pypi_0 pypi traitlets 5.9.0 pypi_0 pypi treelib 1.7.0 pypi_0 pypi triton 2.0.0 pypi_0 pypi typer 0.9.0 pypi_0 pypi typing-extensions 4.7.1 pypi_0 pypi tzdata 2023.3 pypi_0 pypi tzlocal 5.2 pypi_0 pypi urllib3 1.26.16 pypi_0 pypi virtualenv 20.24.3 pypi_0 pypi wandb 0.15.8 pypi_0 pypi wcwidth 0.2.6 pypi_0 pypi weatherbench2 0.2.0 pypi_0 pypi webencodings 0.5.1 pypi_0 pypi websocket-client 1.6.1 pypi_0 pypi werkzeug 2.3.6 pypi_0 pypi wheel 0.41.1 pyhd8ed1ab_0 conda-forge widgetsnbextension 4.0.10 pypi_0 pypi wrapt 1.15.0 pypi_0 pypi xarray 2023.7.0 pypi_0 pypi xarray-beam 0.6.3 pypi_0 pypi xhistogram 0.3.2 pypi_0 pypi xskillscore 0.0.24 pypi_0 pypi xyzservices 2023.10.1 pypi_0 pypi xz 5.2.6 h166bdaf_0 conda-forge yarl 1.9.2 pypi_0 pypi zarr 2.16.0 pypi_0 pypi zict 3.0.0 pypi_0 pypi zipp 3.16.2 pypi_0 pypi zstandard 0.22.0 pypi_0 pypi zstd 1.5.5 hfc55251_0 conda-forge

raspstephan commented 3 months ago

I will tag @shoyer here for xbeam related questions.

shoyer commented 3 months ago

The error indicates that your dataset has different chunking schemes on different variables, so xbeam.open_zarr() won't work. But xbeam.open_zarr is a very thin wrapper, so you can also just call xarray.open_zarr() directly and supply your own chunks.

Is this error being triggered somewhere from the WeatherBench code? If so, I can advise on how to make it more flexible.

jdwillard19 commented 3 months ago

@shoyer Thanks for the response. Yes this is triggered by the following line in scripts/compute_zonal_energy_spectrum.py. This issue doesn't appear for running the WB2 evaluation code with the same data which I've been using beam for as well with the DirectRunner.

https://github.com/google-research/weatherbench2/blob/b2cc3d70ab352019a5cb40fc83b9783e3be726ce/scripts/compute_zonal_energy_spectrum.py#L182C1-L183C1