Open regDaniel opened 1 year ago
I think, we gained some more experience with this during the development of icon_timeseries
. Can we close this one @clairemerker or do you think it is still relevant for iconarray
? If yes, I should probably update the timings.
In a sense the issue is still relevant, @victoria-cherkas and I will write a new version of open_dataset() for iconarray
based on what we learned in icon-timeseries
. No need to update the timings in my opinion, but maybe keep the issue open, we can close it after the new implementation.
This issue is more a documentation for us. We try to optimize the read-in with @clairemerker
Some timings:
cfgrib.open_datasets(engine="cfgrib", backend_kwargs={'indexpath': '', 'errors': 'ignore', encode_cf=("time", "geography", "vertical"))
~ 270 scfgrib.open_datasets(engine="cfgrib", backend_kwargs={'indexpath': '', 'errors': 'ignore', "filter_by_keys": {"typeOfLevel": "generalVerticalLayer"}, }, encode_cf=("time", "geography", "vertical"))
~ 40 sxr.open_dataset(filelist[0], engine="cfgrib", backend_kwargs={'indexpath': '', 'errors': 'ignore', "filter_by_keys": {"typeOfLevel": "generalVerticalLayer"}, }, encode_cf=("time", "geography", "vertical"))
~4-5 s (lazy loading, with subsequentda.load()
80s)xr.open_dataset(filelist[0], engine="cfgrib", backend_kwargs={'indexpath': '', 'errors': 'ignore', "filter_by_keys": {"typeOfLevel": "generalVerticalLayer", "short_name":"T"}, }, encode_cf=("time", "geography", "vertical"))
~8 smore timings with Dask (open 10 icon forecast files and extract one variable):
xr.open_dataset
followed by axr.concat
is ~5-10% faster thanxr.open_mfdataset
.when first merging files with
cat
:All timings were tested on Tsa reading from
/store
, reading from/scratch
reduces read-in times by approximately 10%.