COSIMA / cosima-cookbook

Framework for indexing and querying ocean-sea ice model output.
https://cosima-recipes.readthedocs.io/en/latest/
Apache License 2.0
58 stars 25 forks source link

Loading CICE data is very expensive #287

Open MartinDix opened 2 years ago

MartinDix commented 2 years ago

Loading a CICE variable takes much more time and memory than a MOM variable. E.g.

import cosima_cookbook as cc
session = cc.database.create_session()
expt = '025deg_jra55_ryf9091_gadi'
aice = cc.querying.getvar(expt, 'aice_m', session, n=120)

takes 90 s and several GB of memory (from notebook on OOD) compared to

sea_level = cc.querying.getvar(expt, 'sea_level', session, n=120)

which takes ~15s. Trying to load the full run for a CICE variable takes a crazy amount of memory.

I think the issue is that the CICE variables have

                aice_m:coordinates = "TLON TLAT time" ;

where TLON and TLAT are 2D variables included in the CICE files. MOM variables have

                sea_level:coordinates = "geolon_t geolat_t" ;

where geolon_t and geolat_t are not in the files.

I think this means that xarray.open_mfdataset is reading TLON and TLAT for each file to check if it has to concatenate on those coordinates.

I couldn't see a way of persuading xarray that it should only try to concatenate on the time dimension.

rmholmes commented 2 years ago

Hi Martin. I'm not sure of your specific case, but when loading datasets using xr.open_mfdataset I typically use something like:

OISST = xr.open_mfdataset('/g/data/ua8/NOAA_OISST/AVHRR/v2-1_modified/*_' + str(year) + '.nc',concat_dim="time", combine="nested", data_vars='minimal', coords='minimal', compat='override',parallel=True)

This makes some extra assumptions about concat variables etc. and makes the loading much quicker. It's described in more detail in the "Note" at https://xarray.pydata.org/en/stable/user-guide/io.html#reading-multi-file-datasets

I would have to differ to @angus-g or @aidanheerdegen as to whether these options are/should be implemented in the cookbook.

adele-morrison commented 2 years ago

decode_coords = False speeds it up a lot, as in this IcePlottingExample.

MartinDix commented 2 years ago

Thanks Adele, decode_coords is what I'd been looking for.

access-hive-bot commented 1 year ago

This issue has been mentioned on ACCESS Hive Community Forum. There might be relevant details there:

https://forum.access-hive.org.au/t/issues-loading-access-om2-01-data-from-cycle-4/418/3