Open MartinDix opened 2 years ago
Hi Martin. I'm not sure of your specific case, but when loading datasets using xr.open_mfdataset
I typically use something like:
OISST = xr.open_mfdataset('/g/data/ua8/NOAA_OISST/AVHRR/v2-1_modified/*_' + str(year) + '.nc',concat_dim="time", combine="nested", data_vars='minimal', coords='minimal', compat='override',parallel=True)
This makes some extra assumptions about concat variables etc. and makes the loading much quicker. It's described in more detail in the "Note" at https://xarray.pydata.org/en/stable/user-guide/io.html#reading-multi-file-datasets
I would have to differ to @angus-g or @aidanheerdegen as to whether these options are/should be implemented in the cookbook.
decode_coords = False
speeds it up a lot, as in this IcePlottingExample.
Thanks Adele, decode_coords is what I'd been looking for.
This issue has been mentioned on ACCESS Hive Community Forum. There might be relevant details there:
https://forum.access-hive.org.au/t/issues-loading-access-om2-01-data-from-cycle-4/418/3
Loading a CICE variable takes much more time and memory than a MOM variable. E.g.
takes 90 s and several GB of memory (from notebook on OOD) compared to
which takes ~15s. Trying to load the full run for a CICE variable takes a crazy amount of memory.
I think the issue is that the CICE variables have
where
TLON
andTLAT
are 2D variables included in the CICE files. MOM variables havewhere
geolon_t
andgeolat_t
are not in the files.I think this means that
xarray.open_mfdataset
is readingTLON
andTLAT
for each file to check if it has to concatenate on those coordinates.I couldn't see a way of persuading xarray that it should only try to concatenate on the time dimension.