google-research / arco-era5

Recipes for reproducing Analysis-Ready & Cloud Optimized (ARCO) ERA5 datasets.
https://cloud.google.com/storage/docs/public-datasets/era5
Apache License 2.0
287 stars 22 forks source link

xarray.open_zarr is reading only NANs #83

Closed edur409 closed 3 weeks ago

edur409 commented 3 weeks ago

Hi there,

I tried these simple lines and my plot is empty. On closer inspection, the array is full of NaN values. What could be the issue? Best regards.

import xarray

ar_full_37_1h = xarray.open_zarr( 'gs://gcp-public-data-arco-era5/ar/full_37-1h-0p25deg-chunk-1.zarr-v3', chunks=None, storage_options=dict(token='anon'), )

ar_full_37_1h.geopotential[1,0,:,:].plot()

yingkaisha commented 3 weeks ago

Add to this issue, it seems that full_37-1h-0p25deg-chunk-1.zarr-v3 cannot be accessed properly. Didn't test all but 1959-2022-6h-1440x721.zarr can be opened without NaNs

dabhicusp commented 3 weeks ago

Hello @edur409, The dataset(gs://gcp-public-data-arco-era5/ar/full\_37-1h-0p25deg-chunk-1.zarr-v3) contains data ranging from 1940-01-01 to 2024-05-31 (three months behind the current month). Therefore, if you execute the command ar_full_37_1h.geopotential[1,0,:,:].plot(), it will attempt to access the first date's data (1900-01-01 T01:00:00), which is not available in the dataset. Consequently, you will encounter NaN values.

To resolve this issue, please ensure that you only access data within the specified range of 1940-01-01 to 2024-05-31 (three months behind the current month).

The readme.md file will be updated shortly to reflect this data information.

Please let me know if you have any questions.

P.S: close this issue if your query is resolved 😄.

yingkaisha commented 3 weeks ago

Resolved for me. Thank you.

edur409 commented 3 weeks ago

Silly me! I should have checked at least the latest index of the array too. Thank you!