[DR] Multi-block moments App (and also pattern for other Apps)

Prototype/Production Branches

Please see the branch multib-app for work on moments App. I consider this to be a production branch and other Apps, including GK, will be modified here to support multi-blocks.

Introduction

Extensions to multi-block (MB) geometry will allow us to do far more complex geometry simulations. However, manually specifying the inputs to setup can be tedious and potentially error-prone. In this DR I propose a few design principles and a proposed input driver for moments App.

Principles

The changes to the existing (single-block) App should be minimal or even zero. No API changes should be done unless absolutely needed
The MB App should become the default: to avoid changes to the existing drivers and Lua wrappers we should eventually replace the single-block App new() method to actually call the MB App new() with a single block and other data constructed manually (but hidden from user)
The MB App should minimize (ideally, avoid all together) input duplication
The MP App API names should be identical to the single-block API names, except of course a prefix change (for example, we need to add multib to the API names).
Bespoke changes to the core infra should not be done. Thought should be given to properly extending the infra to increase general capabilities, and not merely implement in haste a collection of one-off functions just for MB.

Running Example for this DR

I am using a 3 block Euler simulation with the following block layout.

  /* Block layout and coordinates

   Y  
   ^  
   |  
   2  +------+
   |  |b0    |
   |  |      |
   1  +------+-----+
   |  |b1    |b2   |
   |  |      |     |
   0  +------+-----+

      0 -----1-----2 -> X
  */

Geometry Specification

I propose introducing a new geometry object that will hold the geometric information (cells, lower, upper bounds and mapc2p, etc) and the topology. This object is mostly App agnostic, but could contain App specific features as needed. For example, the API for this is:

// Geometry info a single config-space block
struct gkyl_block_info {
  // lower and upper extents of blocks
  double lower[GKYL_MAX_CDIM], upper[GKYL_MAX_CDIM];
  int cells[GKYL_MAX_CDIM]; // cells extents in each direction
  int cuts[GKYL_MAX_CDIM];  // domain split to use

  struct gkyl_target_edge connections[GKYL_MAX_CDIM][2]; // block connections
};

This is then used to populate the geometry object, for example, for a block:

  // block 0
  gkyl_block_geom_set_block(bgeom, 0, &(struct gkyl_block_info) {
      .lower = { 0, 1 },
      .upper = { 1, 2 },
      .cells = { 128, 128 },

      .connections[0] = { // x-direction connections
        { .bid = 0, .dir = 0, .edge = GKYL_PHYSICAL }, // physical boundary
        { .bid = 0, .dir = 0, .edge = GKYL_PHYSICAL }  // physical boundary
      },
      .connections[1] = { // y-direction connections
        { .bid = 1, .dir = 1, .edge = GKYL_UPPER_POSITIVE },
        { .bid = 0, .dir = 1, .edge = GKYL_PHYSICAL } // physical boundary
      }
    }
  );

Species and Field Inputs

I propose splitting inputs for species and fields into two parts: a part that is block independent, a part that is per-block. The per-block data need not be repeated if it is common to all block.

For example, the moment species data for an Euler solver would be:

  // all data is common across blocks
  struct gkyl_moment_multib_species_pb euler_blocks[1];
  euler_blocks[0] = (struct gkyl_moment_multib_species_pb) {
    .init = initFluidSod,
  };

  struct gkyl_moment_multib_species euler = {
    .name = "euler",
    .charge = 0.0, .mass = 1.0,
    .equation = euler_eqn,

    .are_all_blocks_same = true,
    .blocks = euler_blocks,
    .bcs = euler_phys_bcs,
  };

Note the use of the flag are_all_blocks_same to indicate that the blocks array has only 1 element in it.

To specify the BCs for each block I propose we introduce another input list that specifies the BCs as follows. The struct would be:

// BC for blocks
struct gkyl_block_physical_bcs {
  int bidx; // block index
  int dir;  // direction in which BC is specified
  enum gkyl_edge_loc edge; // which edge this BC is for
  int bc_type; // BC code
};

This seems different than our usual way to specifying BCs in single-block Apps, but it reduces the potential for setting inconsistent BCs, and reduces duplication. For the Euler simulation this list would be specified as:

  struct gkyl_block_physical_bcs euler_phys_bcs[] = {
    // block 0 BCs
    { .bidx = 0, .dir = 0, .edge = GKYL_LOWER_EDGE, .bc_type = GKYL_SPECIES_REFLECT },
    { .bidx = 0, .dir = 0, .edge = GKYL_UPPER_EDGE, .bc_type = GKYL_SPECIES_REFLECT },
    { .bidx = 0, .dir = 1, .edge = GKYL_UPPER_EDGE, .bc_type = GKYL_SPECIES_COPY },
    // block 1 BCs
    { .bidx = 1, .dir = 0, .edge = GKYL_LOWER_EDGE, .bc_type = GKYL_SPECIES_REFLECT },
    { .bidx = 1, .dir = 1, .edge = GKYL_LOWER_EDGE, .bc_type = GKYL_SPECIES_REFLECT },
    // block 2 BCs
    { .bidx = 2, .dir = 0, .edge = GKYL_UPPER_EDGE, .bc_type = GKYL_SPECIES_COPY },
    { .bidx = 2, .dir = 1, .edge = GKYL_LOWER_EDGE, .bc_type = GKYL_SPECIES_REFLECT },
    { .bidx = 2, .dir = 1, .edge = GKYL_UPPER_EDGE, .bc_type = GKYL_SPECIES_REFLECT },
  };

On Running Multi-Blocks in Parallel

The proposed design for parallelization is as follows. The user must specify cuts for each block. We will then use a round-robin algorithm to assign ranks and communicators to each block. Consider the following two cases that bound the problem.

Case 1: Number of ranks is the same as total cuts summed over all blocks. This means that each rank will have exactly 1 App pointer and each block update will be fully concurrent
Case 2: There is a single rank (serial simulation) and N blocks. In this case the single rank will need to have N App pointers.

In general, the situation will be somewhere in between. As an example for the 3 block Euler simulation, lets say the cuts for the blocks are 2x2, 1x1 and 1x1 respectively. Then ideally, we need 6 ranks to run the simulation fully concurrently. However, say only 4 ranks are specified. Then, we will use a round-robin algorithm to construct the following set of communicators:


comm_b0 = ranks [ 0 1 2 3 ]
comm_b1 = ranks [ 0         ]
comm_b2 = ranks [   1       ]

Hence, on each rank the following App pointers will need to be constructed:

rank 0 : *app_b0, *app_b1
rank 1 : *app_b0, *app_b2
rank 2 : *app_b0
rank 3 : *app_b0

This shows that, In general, the number of App pointers on each rank will be different.

Also note that the MPI_Comm_split is a collective call and hence can't be used to create the communicators needed in the MB App. Instead, we will need to use more detailed methods to choose a list of ranks to include in each communicator and then construct them with other methods in MPI. Of course, this will need to be done consistently with other comms too. (See below)

Note: For a sensible decomposition the number of ranks should be at least as many as the max-cuts computed across all blocks. We may enforce this requirement as a check in the MB App.

The decomp of blocks will be done using a combination of round-robin (RR) decomposition and splitting communicators based on the output of the RR decomp. The key method here is:

struct gkyl_comm* gkyl_comm_create_comm_from_ranks(const struct gkyl_comm *comm, int nranks,
  const int *ranks, struct gkyl_rect_decomp *new_decomp,
  bool *is_valid);

This method is used to create a subset of communicators on which block Apps are constructed.

ammarhakim / gkylzero