esm-tools / esm_master

GNU General Public License v2.0
2 stars 2 forks source link

Process level parallelization of esm_master #46

Open JanStreffing opened 3 years ago

JanStreffing commented 3 years ago

I was talking with Jan Hegewald, Miguel, and Thomas Jung the other day. One of the points that came up during discussion is that we want the esm_tools to be attractive, not just for scientists and maintainers, but also for model devlopers. While waiting on awi-cm3 to compile of aleph, I thought of one feature that might have such an effect.

At moment we already have 5 components in awi-cm3:

Since we are planning to automate more steps from what we would generally call awi-cm3 workflow through esm_tools, I can see at least 2 more that will be added sooner or later:

That's seven components, all of which we currently compile one after the other. On a machine with fast login nodes like ollie this will take ~15-20 minutes. On a machine with slow login lodes like aleph it can be more like 45-60 minutes when done from scratch.

Based on dependencies we could compile quite a number of these components in parallel. This is aided by the fact that parallel compiling usually only scales to a handful of processes, leaving enough cores on a login node to run multiple parallel compilings at a time. The idea would be to define (e.g. in the couplings yaml file) which dependencies need to be fulfilled before the compiling of a component can be kicked off.

Example (no attempt at being grammatically correct)

components:
- eccodes-2.21.0
- perl-5.32.1
- xios-2.5
- rnfmap-awicm3
- oifs-43r3-awi-frontiers
- fesom-2.0-frontiers
- oasis3mct-4.0-awicm3-frontiers
dependencies:
  xios-2.5: oasis3mct-4.0-awicm3-frontiers perl-5.32.1
  oifs-43r3-awi-frontiers: oasis3mct-4.0-awicm3-frontiers perl-5.32.1 xios-2.5
  rnfmap-awicm3: oasis3mct-4.0-awicm3-frontiers
  fesom-2.0-frontiers: oasis3mct-4.0-awicm3-frontiers
coupling_changes:
- sed -i '/FESOM_COUPLED/s/OFF/ON/g' fesom-2.0/CMakeLists.txt
- sed -i '/OIFS_COUPLED/s/OFF/ON/g' fesom-2.0/CMakeLists.txt
- sed -i '/COUPLENEMOECE = /s/.TRUE./.FALSE./g' oifs-43r3/src/ifs/module/yommcc.F90
- sed -i '/COUPLEFESOM2 = /s/.FALSE./.TRUE./g' oifs-43r3/src/ifs/module/yommcc.F90
- sed -i '/COUPLENEMOFOCI = /s/.TRUE./.FALSE./g' oifs-43r3/src/ifs/module/yommcc.F90

In this case perl, eccodes and oasis can all start right away. As soon as perl is done xios can start as well. As soon as oasis finishes fesom and rnfmap can kick off. When xios is done openifs can start. XIOS and OpenIFS will still take a while, but we might be able to cut the whole compile time in half.

How much effort would it be to implement something like this on the backend?

Inviting feedback @mandresm @hegish