feat req: Memory scalable static data in DART - both static across the ensemble and per-ensemble member static data

hkershaw-brown commented 1 month ago

There are several model_mods and core DART modules that have a fixed size memory requirement on each processor. The memory usage is static_mem* num_procs (does not scale as you add processors), and is a hard limit for the model size in DART.

Goal:

memory usage per core = static_mem / num_procs
total memory usage = static_mem

Rather than the current:

memory usage per core = static_mem 
total memory usage = static_mem * num_procs

Note the code may need to be sensible about what static data is tiny (fine on every core) vs. large.

Static data in DART:

static data, same across the ensemble:
- WRF phb (3d variable sized static data). A wrf model_mod version with distributed phb was written 2014/16 but never released.
- Mesh structures (e.g. MPAS)
- quad_interp utilities data structures (particularly MOM6 CESM3 workhorse 2/3-degree)
- POP interpolation data structures
- get_close data structures
Per ensemble member static data: This gets put into the state at the moment, so is inflated (maybe should not be). An example (I think) is the CLM fields that are 'no-update' see #276

In addition (going as a separate issue), is observation sequence files which are on every core (and particularly for external forward operators which are in the obs sequence).

hkershaw-brown commented 1 month ago

WRF PHB is read from a wrfinput template file, but is PHB in every wrf file?
If so it is "Per ensemble member static data" that is equal for every ensemble member

kdraeder commented 1 week ago

Here's question that might influence our choices: is it reasonably easy to store some kinds of data distributed across a single node, which is essentially the tasks we request from each node? This would cut down on memory usage and not increase internode communication.

Here's a framework for thinking about names for the kinds of data filter needs to store and some possibilities to consider. Short and common usually wins over longer and more meaningful. (except when trying to sound impressive: "intercomparison", "irregardless", ...) I tried to think of short and meaningful descriptions. Combinations of 2 simple words can be useful.

   First dimension: time varying;
      no = metadata about grids, including surface and boundaries.
         "static" in my/most vocabularies
      yes =  "evolving",  "time varying" (pairs with member-varying, below)
         due directly to assimilation:
            "assimilated",  "updated"
         due indirectly to assimilation through the model forecast: (some is currently called "no-copy-back")
             "not updated", "carried",  "passive", "baggage",

   Second dimension; within an ensemble:
      no varying among members:  (Helen has called: "static".  could be made specific by "ensemble static") 
         "no spread", "ensemble constant", "member independent"
      varying between members: 
         "updated"? (implies time varying too)
         "member varying"  "member dependent"

   Third dimension: size.  Mostly determines the importance of distributing it.
      1D, 2D, 3D

I prefer leaving "prognostic" and "diagnostic" for classifying variables in models.

hkershaw-brown commented 1 week ago

is it reasonably easy to store some kinds of data distributed across a single node, which is essentially the tasks we request from each node? This would cut down on memory usage and not increase internode communication.

Yes for sure it is "easy" - it is just counting things.

braczka commented 1 week ago

WRF PHB is read from a wrfinput template file, but is PHB in every wrf file? If so it is "Per ensemble member static data" that is equal for every ensemble member

As far as I know PHB (base state geopotential) is in every wrfinput file. It is static both across ensemble member and in time. It needs to be summed with the PH (perturbation geopotential) to provide actual geopotential.

NCAR / DART

feat req: Memory scalable static data in DART - both static across the ensemble and per-ensemble member static data #744