Open SimonDanisch opened 6 months ago
One thought I have in mind reading the example. I might be wrong though.
Depending on the chunks of the zarr folder on Google, the specific slice asked will still need to download the whole dataset between 2018 and 2050, probably a little bit more for the edges on 2018 and 2050. The whole dataset between 2018 and 2050 is 3.21GB. Is it closer to your measurement?
c = g["tas"]
ct = c[Ti=At(Date("2018-08-01"):Date("2050-08-01"))]
384×192×11689 YAXArray{Float32,3} with dimensions:
Dim{:lon} Sampled{Float64} 0.0:0.9375:359.0625 ForwardOrdered Regular Points,
Dim{:lat} Sampled{Float64} Float64[-89.28422753251364, -88.35700351866494, …, 88.35700351866494, 89.28422753251364] ForwardOrdered Irregular Points,
Ti Sampled{DateTime} DateTime[2018-08-01T00:00:00, …, 2050-08-01T00:00:00] ForwardOrdered Irregular Points
units: K
name: tas
Total size: 3.21 GB
Note that I tried to do the same approach in Python and it seems to behave similarly
(in python, I specified the whole timeseries between 2018 and 2050 for simplicity)
import xarray as xr
import zarr
file = 'gs://cmip6/CMIP6/ScenarioMIP/DKRZ/MPI-ESM1-2-HR/ssp585/r1i1p1f1/3hr/tas/gn/v20190710/'
ds = xr.open_dataset(file, engine='zarr')
c = ds.tas
ct = c.sel(time=slice("2018-08-01", "2050-08-01"))
%time ct.values
CPU times: user 3min 19s, sys: 1min 29s, total: 4min 49s
Wall time: 21min 58s
Out[12]:
array([[[216.41226, 216.48257, 216.44742, ..., 216.32828, 216.38297,
216.40054],
I'm trying the example from the docs:
This takes reaally long and fills up all my RAM (32gb). A few infos:
The selected slice:
Download speed of the julia process
I was expecting it to only download the 328mb, but from the download speed and RAM usage I suspect it's downloading much more data, making it almost impossible to download this part of the dataset... Am I missing something or is this a bug, or just a limitation of the package?