main has broken orchestration due to the passing of wsd / cappa as quantities down the stack. A DaCe bug means that wsd has output only creates a memory description failure we had to get around.
Two damping coefficients scalars da_min and da_min_c are decomposition dependent, which breaks our distributed compile system with DaCe.
Using internal of C_SW instead of return values to workaround a memory leak happening in the pooling of DaCe.
Code changes:
Turn distributed compilation back on, e.g. DEACTIVATE_DISTRIBUTED_DACE_COMPILE = False
wsd and cappa are now cached on the proper top-level module (fv_dyn & acoustics) and there storage are access via self
See comment left in code:
# We need to use a getter for da_min & da_min_c in order to go around a DaCe inline
# behavior. As part of the automatic optimization process, DaCe tries to inline
# as many scalars as possible.
# The grid is _not_ passed as an input to the top level function we orchestrate,
# so its scalar values will be inlined.
'alas, our distributed compilation system works by compiling a 3,3 layout
top tile, then using those 9 caches on every layout upward.
This setup leads to the values of da_min/da_min_c from the 3,3 layout
to be inlined in the generated code. Those variables are used in runtime
calculation (kinetic energy, etc.) which obviously leads to misbehaving numerics
and errors when the 3,3 layout values are used on larger layouts
The solution we implement here is making use of the fact that callbacks
are never inlined in dace optimization. the current workaround uses the
following functions.
An alternative would be to pass the Grid or the DampingCoefficients to DaCe,
clearly flagging it has a dynamic piece of memory (which would
cancel any inlining) but the feature to do that (dace.struct)
is currently in disarray.
N.B.: another solution is to pass da_min and da_min_c as input, put it seems
odd and adds a lot of boilerplate throughout the model code.
- `dace_constants_args` has been renamed to `dace_compiletime_args` to match DaCe API naming change
- change in field referenced in dyn_core, see comment left in code above `self.ptc` and `self.delpc`:
ToDo: Due to DaCe VRAM pooling creating a memory
leak with the usage pattern of those two fields (to be fixed soon)
## Requirements changes:
- N/A
## Infrastructure changes:
- N/A
## Checklist
Before submitting this PR, please make sure:
- [ ] You have followed the coding standards guidelines established at [Code Review Checklist](https://drive.google.com/file/d/1R0nqOxfYnzaSdoYdt8yjx5J482ETI2Ft/view?usp=sharing).
- [ ] Docstrings and type hints are added to new and updated routines, as appropriate
- [ ] All relevant documentation has been updated or added (e.g. README, CONTRIBUTING docs)
Purpose
This PR address two issues:
main
has broken orchestration due to the passing ofwsd
/cappa
as quantities down the stack. A DaCe bug means thatwsd
has output only creates a memory description failure we had to get around.da_min
andda_min_c
are decomposition dependent, which breaks our distributed compile system with DaCe.Code changes:
DEACTIVATE_DISTRIBUTED_DACE_COMPILE = False
wsd
andcappa
are now cached on the proper top-level module (fv_dyn & acoustics) and there storage are access viaself
'alas, our distributed compilation system works by compiling a 3,3 layout
top tile, then using those 9 caches on every layout upward.
This setup leads to the values of da_min/da_min_c from the 3,3 layout
to be inlined in the generated code. Those variables are used in runtime
calculation (kinetic energy, etc.) which obviously leads to misbehaving numerics
and errors when the 3,3 layout values are used on larger layouts
The solution we implement here is making use of the fact that callbacks
are never inlined in dace optimization. the current workaround uses the
following functions.
An alternative would be to pass the Grid or the DampingCoefficients to DaCe,
clearly flagging it has a dynamic piece of memory (which would
cancel any inlining) but the feature to do that (dace.struct)
is currently in disarray.
N.B.: another solution is to pass da_min and da_min_c as input, put it seems
odd and adds a lot of boilerplate throughout the model code.
ToDo: Due to DaCe VRAM pooling creating a memory
leak with the usage pattern of those two fields (to be fixed soon)
We use the C_SW internal to workaround it e.g.:
- self.cgrid_shallow_water_lagrangian_dynamics.delpc
- self.cgrid_shallow_water_lagrangian_dynamics.ptc