Converting notebooks from COSIMA Cookbook to ACCESS-NRI intake catalog

navidcy commented 10 months ago

Why we need to have two ways for people to load output? At the moment, after #298, we have tutorials both for the cookbook and for the ACCESS-NRI Intake catalog

https://cosima-recipes.readthedocs.io/en/latest/tutorials.html

Let's just leave the best of the two. Is ACCESS-NRI Intake catalog the future? Let's make that the default in all examples and drop the cookbook then?

cc @angus-g, @aidanheerdegen, @AndyHoggANU, @aekiss, @adele-morrison, @anton-seaice, @PaulSpence, @rmholmes, @edoddridge, @micaeljtoliveira

Recipes

[ ] Along_slope_velocities.ipynb
[x] Apply_function_to_every_gridpoint.ipynb (#461 )
[ ] Atlantic_IndoPacific_Basin_Overturning_Circulation.ipynb (https://github.com/COSIMA/cosima-recipes/pull/350)
[x] Barotropic_Streamfunction.ipynb (https://github.com/COSIMA/cosima-recipes/pull/351)
[x] ~Bathymetry.ipynb~ (https://github.com/COSIMA/cosima-recipes/pull/352)
[x] Compare_SSH_model_obs.ipynb (https://github.com/COSIMA/cosima-recipes/pull/354)
[x] Compare_SST_SSS_TemperatureSalinity_to_WOA13.ipynb (https://github.com/COSIMA/cosima-recipes/pull/355)
[x] Cross-contour_transport.ipynb (https://github.com/COSIMA/cosima-recipes/pull/356)
[ ] Cross-slope_section.ipynb
[x] Eddy-Mean_Kinetic_Energy_Decomposition.ipynb (https://github.com/COSIMA/cosima-recipes/pull/358)
[x] Equatorial_thermal_and_zonal_velocity_structure.ipynb (https://github.com/COSIMA/cosima-recipes/pull/359)
[x] Extract_Variables_at_Ocean_Bottom.ipynb (https://github.com/COSIMA/cosima-recipes/pull/349)
[x] Geostrophic_Velocities_from_Sea_Level.ipynb (https://github.com/COSIMA/cosima-recipes/pull/360)
[x] Horizontal_Regridding.ipynb (https://github.com/COSIMA/cosima-recipes/pull/413)
[x] Hovmoller_Temperature_Depth.ipynb (https://github.com/COSIMA/cosima-recipes/pull/466)
[ ] Meridional_Overturning_Circulation.ipynb (https://github.com/COSIMA/cosima-recipes/pull/374)
[ ] Meridional_heat_transport.ipynb (https://github.com/COSIMA/cosima-recipes/pull/361)
[x] Model_Resolution_Comparison.ipynb (https://github.com/COSIMA/cosima-recipes/pull/362)
[x] Nearest_Neighbour_Distance.ipynb (https://github.com/COSIMA/cosima-recipes/pull/363)
[x] Neutral_density.ipynb [partly done; intake for mom5]
[ ] Particle_tracking_with_Parcels.ipynb (https://github.com/COSIMA/cosima-recipes/pull/364)
[x] Querying_Scalar_Quantities_and_Annually_Averaged_Timeseries.ipynb (https://github.com/COSIMA/cosima-recipes/pull/365)
[x] RelativeVorticity.ipynb (https://github.com/COSIMA/cosima-recipes/pull/426)
[x] Sea_Ice_Area_Concentration_Volume_with_Obs.ipynb
[ ] Sea_Ice_Coordinates.ipynb (https://github.com/COSIMA/cosima-recipes/pull/369)
[ ] Sea_Ice_Seasonality_Statistics.ipynb
[ ] Surface_Water_Mass_Transformation.ipynb (https://github.com/COSIMA/cosima-recipes/pull/370)
[ ] Temperature_Salinity_Diagram.ipynb (https://github.com/COSIMA/cosima-recipes/pull/371)
[ ] Transformation_from_Depth_to_Potential_Density.ipynb (https://github.com/COSIMA/cosima-recipes/pull/353)
[x] Transport_Through_Straits.ipynb (https://github.com/COSIMA/cosima-recipes/pull/372)
[ ] True_Zonal_Mean.ipynb (https://github.com/COSIMA/cosima-recipes/pull/373)

Tutorials

[x] Animations_with_xmovie.ipynb
[x] Making_Maps_with_Cartopy.ipynb (https://github.com/COSIMA/cosima-recipes/pull/377)
[ ] Model_Agnostic_Analysis.ipynb (https://github.com/COSIMA/cosima-recipes/pull/378)
[x] Spatial_selection.ipynb (https://github.com/COSIMA/cosima-recipes/pull/379)
[x] Submitting_analysis_jobs_to_gadi.ipynb (https://github.com/COSIMA/cosima-recipes/pull/380)
[x] Template_For_Notebooks.ipynb (https://github.com/COSIMA/cosima-recipes/pull/381)
[x] Using_Explorer_tools.ipynb (https://github.com/COSIMA/cosima-recipes/pull/382)

ACCESS-OM2-GMD-Paper-Figs

[ ] Fig12-ZonalTempSaltBias.ipynb
[x] Fig3-GlobalTimeseries.ipynb (https://github.com/COSIMA/cosima-recipes/pull/344)
[x] Fig4-DrakePassageTransport.ipynb (https://github.com/COSIMA/cosima-recipes/pull/345)
[x] Figs14-20-23-24-PlotWOCETransects.ipynb (https://github.com/COSIMA/cosima-recipes/pull/346)

adele-morrison commented 3 months ago

Generally I’m not concerned by the amount of resources we use for analysis, even if people are always running XXLarge (full node) ARE sessions. The amount of compute is still tiny compared with most of our allocations. Though it does seem overkill if the large compute is only needed for Intake.

dougiesquire commented 3 months ago

Though it does seem overkill if the large compute is only needed for Intake.

It's not needed, it will make things faster

AndyHoggANU commented 3 months ago

I'm also not worried by the resource use if it's needed -- but I agree this is something we can look to improve with intake in the medium term.

edoddridge commented 3 months ago

I think that level of usage is very much in the noise. To put some numbers on this, the estimated usage from ARE for an 8 hour session with XX-Large is

SU estimate 28 cpu cores + 252GB mem on normalbw queue (1.25 SUs/core/h) for 8h = 280 SUs

It's not a big deal.

navidcy commented 3 months ago

I think you/we/everybody should go full on using the ARE resources and save your time and effort for analyses, thinking, fun activities, etc.

julia-neme commented 3 months ago

Jaja great! I might be scarred from my beginner days when I was told to get better at dask instead of opening bigger gadi_jupyter notebooks :stuck_out_tongue_closed_eyes: I'll up my CPU game.

navidcy commented 3 months ago

@anton-seaice, @dougiesquire:

I'm trying to understand how to proceed here.

Personally I'm hitting big slowdown issues, much worse then 2x reported above. @julia-neme seems to be doing so as well and we have an open channel of communications providing mental support to each other.

But if me and @julia-neme are hitting these then I expect most COSIMA users to hit those as well.

I see a bunch of kwargs here proposed by @anton-seaice. Do we need those? Should we always have them? If so, we need to have them in the PRs for the intake conversion... Otherwise they are hidden in a comment in an issue.

data_ic = catalog[experiment].search(
    variable=variable, 
    frequency="1mon"
).to_dask()

data_ic_times = data_ic.sel(time=slice(start_time, end_time))

.. do the calculation
If that to_dask step is slow, it may be due to there being many files to concatenate. You can make you instance bigger, make sure you start a dask cluster (with threads_per_worker=1) before running to_dask, and add these keywords to the .to_dask() call:
xarray_combine_by_coords_kwargs=dict(
    compat="override",
    data_vars="minimal",
    coords="minimal"
)

catalog[experiment].search(
    variable=variable, 
).to_dask(
    xarray_combine_by_coords_kwargs=xarray_combine_by_coords_kwargs,
)

I'm a bit in despair. I'm very intrigued and interested in understanding the nitty gritty details of dask and xarray but in terms of what we suggests all COSIMAers to do, I would like to give them something that works. (Also for myself btw, most times I'd like just to copy a line and paste it and load the data -- I don't wanna be thinking of xarray internals etc.) If we should have these kwargs in then let's have them there all the time?

At the moment I'm trying to push through several of the open PRs for intake conversion and I'm hitting these issues. Perhaps we should pause trying to penetrate those PRs until I understand what's happening?

navidcy commented 3 months ago

Btw I'm hitting these issues even when I'm trying to load 1 variable... so no concatenation (or at least no concatenation as far as I understand!)

dougiesquire commented 3 months ago

@navidcy could you please point me to an example of your issue(s)?

navidcy commented 3 months ago

I'll try to do that later... Or @adele-morrison / @julia-neme feel free to do so?

https://github.com/COSIMA/cosima-recipes/pull/344#issuecomment-2255107842 might be one?

adele-morrison commented 3 months ago

I did some more timing tests, and I actually find the kwargs don't help at all. There is just a change in timing due to caching on the 2nd run using Intake. I'm getting that Intake is about 2x slower than the cookbook when both are run for the first time.

This one runs the kwargs case first (50s for Intake compared with 28s for cookbook):

This one I changed the order and did no kwargs first (52s for Intake vs 25s for cookbook).

Running Intake a second time with either kwargs and no kwargs is much faster than the first time.

So I don't think it will help us to use kwargs.

anton-seaice commented 3 months ago

I see a bunch of kwargs here proposed by @anton-seaice. Do we need those? Should we always have them? If so, we need to have them in the PRs for the intake conversion... Otherwise they are hidden in a comment in an issue.

Apologies for the confusion about this. These kwargs help with sea-ice (cice) data but not with ocean (mom) data.

Its similar to the decode_coords=False which we used to set for the cookbook for cice data.

MOM only add 1d coordinates (i.e. xt_ocean, yt_ocean and time) to its output:

Whilst cice adds 2d coordinates to its output (geographic lons/lats at T and U points):

When .to_dask() is called this is concatenating all the files for that variable. Part of the concatenation is, by default, xarray will check that all of the coordinates in the files are consistent (i.e. the times don't overlap, but the other coordinates (lat/lon) are identical). Doing this check with 2d arrays from cice is much slower than doing this check with 1d arrays from mom. As we know the coordinates are consistent (they come from the same model at all times), we can tell xarray not to do the checks for consistency by setting the xarray_combine_by_coords_kwargs argument and therefore speed up loading cice data.

dougiesquire commented 3 months ago

Personally I'm hitting big slowdown issues

https://github.com/COSIMA/cosima-recipes/pull/344#issuecomment-2255107842 might be one?

@navidcy, I've looked into this and replied, but I didn't encounter the big slowdown issues you refer to here. I think this one might mostly have been a case of comparing apples to oranges.

marc-white commented 2 months ago

I can see a large number of INTAKE branches that aren't currently referenced in the issue header - does anyone mind if I do a bit of issue admin and collate them all together?

julia-neme commented 2 months ago

Would collating the branches collate the pull requests too? I think I've found working through these that most of the notebooks require more work than just an intake conversion, so merging into one pull request would potentially generate discussion on different topics all mixed up in one PR.

marc-white commented 2 months ago

@julia-neme oh no, absolutely not. I just meant doing the administrative task of working out where all the various branches/PRs are up to, and dumping that information in the issue description, given a lot of them aren't cross-linked (although I've just realized I can't edit the initial issue text myself).

julia-neme commented 2 months ago

Ohh I guess there are a lot of branches with conversions already complete that could be cleaned up?

navidcy commented 2 months ago

@julia-neme could you edit the first comment on this issue and mark with ticks the recipes you know have been converted?

marc-white commented 2 months ago

@julia-neme could you edit the first comment on this issue and mark with ticks the recipes you know have been converted?

@julia-neme could we also link in the existing PRs (both open and closed) against the relevant recipe? The list I came up with so far is:

Atlantic... #350
Barotropic... #351
Compare_SST... #355
Cross-contour... #356
Equatorial_thermal... #359
Hovmoller... #348
Meridional overturning... #374
Meridional heat... #361
Particle tracking... #364
Surface_water... #370
True_zonal... #373

julia-neme commented 2 months ago

Yep! I'll do on monday. Note that not all notebooks have a PR request requesting a conversion. I'm not sure how all those PRs/branches happened.

marc-white commented 2 months ago

I'm not sure how all those PRs/branches happened.

Not that I've ever tried it before, so take this suggestion with a grain of salt, but we could even stand up sub-issues for each recipe that hasn't been completed yet: https://dev.to/keracudmore/create-sub-issues-in-github-issues-409m

julia-neme commented 2 months ago

Hey @marc-white, I think we've updated the list at the beginning with @navidcy and should have all the links. Hope it is useful.

marc-white commented 1 month ago

Could someone with the appropriate superpowers please add me to the project? I was just stymied in my attempts to push back my changes to the #356 branch.

navidcy commented 1 month ago

added you; let me know if you still have issues

marc-white commented 1 month ago

Apart from my Outlook client's stubborn refusal to open the invitation email at all, that seems to have done it! Thanks @navidcy !

COSIMA / cosima-recipes