I've reorganized the README to separately introduce "Analysis Ready" and "Cloud Optimized" datasets, which an expectation that users will be most interested in the former.
I've also updated all datasets with size and chunking information, generating with the following snippet:
import xarray_beam
import math
def get_size(x):
for threshold, units in [
(1e6, 'MB'),
(1e9, 'GB'),
(1e12, 'TB'),
(1e15, 'PB'),
]:
if x < threshold * 1000:
return x/threshold, units
raise RuntimeError('unhandled size')
for path in [
'gs://gcp-public-data-arco-era5/ar/full_37-1h-0p25deg-chunk-1.zarr-v3',
'gs://gcp-public-data-arco-era5/ar/model-level-1h-0p25deg.zarr-v1',
'gs://gcp-public-data-arco-era5/co/model-level-wind.zarr-v2',
'gs://gcp-public-data-arco-era5/co/model-level-moisture.zarr-v2',
'gs://gcp-public-data-arco-era5/co/single-level-surface.zarr-v2',
'gs://gcp-public-data-arco-era5/co/single-level-reanalysis.zarr-v2',
'gs://gcp-public-data-arco-era5/co/single-level-forecast.zarr-v2',
]:
ds, chunks = xarray_beam.open_zarr(
path, storage_options=dict(token='anon')
)
print()
print(path)
size, units = get_size(ds.sel(time=slice("1940", None)).nbytes)
print(f'Total size (1940-present): {size:.3g} {units}')
print('Chunks:', chunks)
size, units = get_size(4*math.prod(chunks.values()))
print(f'Chunk size: {size:.3g} {units}')
print(f'Last time: {ds.indexes["time"][-1]}')
I've reorganized the README to separately introduce "Analysis Ready" and "Cloud Optimized" datasets, which an expectation that users will be most interested in the former.
I've also updated all datasets with size and chunking information, generating with the following snippet:
This currently outputs: