google-research / arco-era5

Recipes for reproducing Analysis-Ready & Cloud Optimized (ARCO) ERA5 datasets.
https://cloud.google.com/storage/docs/public-datasets/era5
Apache License 2.0
308 stars 23 forks source link

Getting a time slice for a single field and gridpoint seems to take too long #90

Open RubendeBruin opened 1 week ago

RubendeBruin commented 1 week ago

I want to obtain the wind speeds for a single location over a period of time. When testing with a relatively short period of 1 year then getting the data takes

import xarray

time_slice = slice('2022-01-01', '2023-01-01')
target_lon = 13.5
target_lat = 38.25

ds = xarray.open_zarr(
    'gs://gcp-public-data-arco-era5/ar/full_37-1h-0p25deg-chunk-1.zarr-v3',
    chunks=None,
    storage_options=dict(token='anon')
)

import datetime
print(datetime.datetime.now())
values = ds['10m_u_component_of_wind'].sel(longitude=target_lon, latitude=target_lat, time=time_slice).values
print(len(values))
print(datetime.datetime.now())

my internet connection is very decent but after 20 minutes I killed the process.

Is there something that I'm doing wrong or is getting a time-slice for a single lat/long just very inefficient?

I tried again with a day of data and that took 3 minutes:

<xarray.DataArray '10m_u_component_of_wind' (time: 48)> Size: 192B
[48 values with dtype=float32]
Coordinates:
  * time     (time) datetime64[ns] 384B 2022-01-01 ... 2022-01-02T23:00:00
Attributes:
    long_name:   10 metre U wind component
    short_name:  u10
    units:       m s**-1
2024-11-12 22:04:47.986297
2024-11-12 22:07:59.549778
RubendeBruin commented 1 week ago

same issue as #69