COSIMA / cosima-recipes

Example recipes for analyzing model output using the cosima-cookbook infrastructure
https://cosima-recipes.readthedocs.io
Apache License 2.0
43 stars 65 forks source link

Use xarray functionality in Documented Examples #321

Open navidcy opened 4 months ago

navidcy commented 4 months ago

Can we replace the for loop in cell 8 here:

https://cosima-recipes.readthedocs.io/en/latest/DocumentedExamples/Meridional_heat_transport.html#Method-2:-using-from-the-net-surface-heat-flux-(assuming-steady-state)

with some xarray-dask friendly functionality?

cc @dhruvbhagtani, @angus-g

angus-g commented 4 months ago

Yeah, that loop looks a bit suspect to me... Its performance is O(N^2)-ish, because the cumulative sum is recomputing the entire array south of the current loop index.

I think it should be something like the following:

shflux = cc.querying.getvar(experiment, 'net_sfc_heating', session, n=3)
shflux_am = shflux.mean('time').load()

area = cc.querying.getvar(experiment, 'area_t', session, ncfile="ocean_grid.nc", n=1)
lat  = cc.querying.getvar(experiment, 'geolat_t', session, ncfile="ocean_grid.nc", n=1)
latv = cc.querying.getvar(experiment, 'yt_ocean', session, ncfile="ocean_grid.nc", n=1)

# create left edge for bottom bin
latv_bins = np.hstack(([-90], latv.values))

# replicate original loop
mhf = (
    (shflux_am * area)
    .groupby_bins("geolat_t", latv_bins)
    .sum()
    .cumsum()
    .rename(geolat_t_bins="yt_ocean")
)
mhf.coords["yt_ocean"] = latv

mht_method2 = mhf + (mhf.isel(yt_ocean=0) - mhf.isel(yt_ocean=-1)) / 2

This takes about 8s with 4 workers, vs. 3 min for the original method, and the answers have a relative accuracy of around 1e-5...

navidcy commented 4 months ago

22x speedup I think it's great!! thanks @angus-g

navidcy commented 4 months ago

Here's another example from remap_depth(remap_dens, psi_args, psi_avg, session, model) method in the "Zonally-averaged overturning example"

Specifically,

    for ii in range(nmin, nmax):
        rho1 = rho2_zonal_mean.cf.isel(latitude=ii)
        rho1v = rho1.copy()
        z = rho1.cf['vertical']
        rho1 = rho1.rename({rho1.cf['vertical'].name: 'rho_ref'})
        rho1['rho_ref'] = np.array(rho1v)
        rho1.name = rho2_zonal_mean.cf['vertical'].name
        rho1.values = np.array(z)

        rho1 = rho1.isel(rho_ref = ~np.isnan(rho1.rho_ref)).drop_duplicates(dim='rho_ref', keep='first')

        rho1 = rho1.interp(rho_ref = psi_avg.cf['vertical'].values,
                           kwargs={"bounds_error": False, "fill_value": (0, 6000)})
        psi_depth[:, ii] = rho1.rename({'rho_ref': psi_avg.cf['vertical'].name})

(See https://github.com/COSIMA/cosima-recipes/pull/324)

polinaesia commented 4 days ago

I compared Angus' method with the initial one, and the results are in good agreement except at a couple of points, as seen in the screenshot: Image Should I push the commit using the loop created by Angus?

I also tried handling units with pint package, but an issue occurred when pint encountered degrees Celsius (in Cp). It seems pint doesn't handle Celsius well.