ammarhakim / gkylzero

Lowest, compiled layer of Gkeyll infrastructure.
MIT License
20 stars 4 forks source link

[DR] Multi-block moments App (and also pattern for other Apps) #411

Open ammarhakim opened 2 months ago

ammarhakim commented 2 months ago

Prototype/Production Branches

Please see the branch multib-app for work on moments App. I consider this to be a production branch and other Apps, including GK, will be modified here to support multi-blocks.

Introduction

Extensions to multi-block (MB) geometry will allow us to do far more complex geometry simulations. However, manually specifying the inputs to setup can be tedious and potentially error-prone. In this DR I propose a few design principles and a proposed input driver for moments App.

Principles

Running Example for this DR

I am using a 3 block Euler simulation with the following block layout.

  /* Block layout and coordinates

   Y  
   ^  
   |  
   2  +------+
   |  |b0    |
   |  |      |
   1  +------+-----+
   |  |b1    |b2   |
   |  |      |     |
   0  +------+-----+

      0 -----1-----2 -> X
  */  

Geometry Specification

I propose introducing a new geometry object that will hold the geometric information (cells, lower, upper bounds and mapc2p, etc) and the topology. This object is mostly App agnostic, but could contain App specific features as needed. For example, the API for this is:

// Geometry info a single config-space block
struct gkyl_block_info {
  // lower and upper extents of blocks
  double lower[GKYL_MAX_CDIM], upper[GKYL_MAX_CDIM];
  int cells[GKYL_MAX_CDIM]; // cells extents in each direction
  int cuts[GKYL_MAX_CDIM];  // domain split to use

  struct gkyl_target_edge connections[GKYL_MAX_CDIM][2]; // block connections
};

This is then used to populate the geometry object, for example, for a block:

  // block 0
  gkyl_block_geom_set_block(bgeom, 0, &(struct gkyl_block_info) {
      .lower = { 0, 1 },
      .upper = { 1, 2 },
      .cells = { 128, 128 },

      .connections[0] = { // x-direction connections
        { .bid = 0, .dir = 0, .edge = GKYL_PHYSICAL }, // physical boundary
        { .bid = 0, .dir = 0, .edge = GKYL_PHYSICAL }  // physical boundary
      },
      .connections[1] = { // y-direction connections
        { .bid = 1, .dir = 1, .edge = GKYL_UPPER_POSITIVE },
        { .bid = 0, .dir = 1, .edge = GKYL_PHYSICAL } // physical boundary
      }
    }
  );

Species and Field Inputs

I propose splitting inputs for species and fields into two parts: a part that is block independent, a part that is per-block. The per-block data need not be repeated if it is common to all block.

For example, the moment species data for an Euler solver would be:

  // all data is common across blocks
  struct gkyl_moment_multib_species_pb euler_blocks[1];
  euler_blocks[0] = (struct gkyl_moment_multib_species_pb) {
    .init = initFluidSod,
  };

  struct gkyl_moment_multib_species euler = {
    .name = "euler",
    .charge = 0.0, .mass = 1.0,
    .equation = euler_eqn,

    .are_all_blocks_same = true,
    .blocks = euler_blocks,
    .bcs = euler_phys_bcs,
  };

Note the use of the flag are_all_blocks_same to indicate that the blocks array has only 1 element in it.

To specify the BCs for each block I propose we introduce another input list that specifies the BCs as follows. The struct would be:

// BC for blocks
struct gkyl_block_physical_bcs {
  int bidx; // block index
  int dir;  // direction in which BC is specified
  enum gkyl_edge_loc edge; // which edge this BC is for
  int bc_type; // BC code
};

This seems different than our usual way to specifying BCs in single-block Apps, but it reduces the potential for setting inconsistent BCs, and reduces duplication. For the Euler simulation this list would be specified as:

  struct gkyl_block_physical_bcs euler_phys_bcs[] = {
    // block 0 BCs
    { .bidx = 0, .dir = 0, .edge = GKYL_LOWER_EDGE, .bc_type = GKYL_SPECIES_REFLECT },
    { .bidx = 0, .dir = 0, .edge = GKYL_UPPER_EDGE, .bc_type = GKYL_SPECIES_REFLECT },
    { .bidx = 0, .dir = 1, .edge = GKYL_UPPER_EDGE, .bc_type = GKYL_SPECIES_COPY },
    // block 1 BCs
    { .bidx = 1, .dir = 0, .edge = GKYL_LOWER_EDGE, .bc_type = GKYL_SPECIES_REFLECT },
    { .bidx = 1, .dir = 1, .edge = GKYL_LOWER_EDGE, .bc_type = GKYL_SPECIES_REFLECT },
    // block 2 BCs
    { .bidx = 2, .dir = 0, .edge = GKYL_UPPER_EDGE, .bc_type = GKYL_SPECIES_COPY },
    { .bidx = 2, .dir = 1, .edge = GKYL_LOWER_EDGE, .bc_type = GKYL_SPECIES_REFLECT },
    { .bidx = 2, .dir = 1, .edge = GKYL_UPPER_EDGE, .bc_type = GKYL_SPECIES_REFLECT },
  };

On Running Multi-Blocks in Parallel

The proposed design for parallelization is as follows. The user must specify cuts for each block. We will then use a round-robin algorithm to assign ranks and communicators to each block. Consider the following two cases that bound the problem.

In general, the situation will be somewhere in between. As an example for the 3 block Euler simulation, lets say the cuts for the blocks are 2x2, 1x1 and 1x1 respectively. Then ideally, we need 6 ranks to run the simulation fully concurrently. However, say only 4 ranks are specified. Then, we will use a round-robin algorithm to construct the following set of communicators:


comm_b0 = ranks [ 0 1 2 3 ]
comm_b1 = ranks [ 0         ]
comm_b2 = ranks [   1       ]

Hence, on each rank the following App pointers will need to be constructed:

rank 0 : *app_b0, *app_b1
rank 1 : *app_b0, *app_b2
rank 2 : *app_b0
rank 3 : *app_b0

This shows that, In general, the number of App pointers on each rank will be different.

Also note that the MPI_Comm_split is a collective call and hence can't be used to create the communicators needed in the MB App. Instead, we will need to use more detailed methods to choose a list of ranks to include in each communicator and then construct them with other methods in MPI. Of course, this will need to be done consistently with other comms too. (See below)

Note: For a sensible decomposition the number of ranks should be at least as many as the max-cuts computed across all blocks. We may enforce this requirement as a check in the MB App.

The decomp of blocks will be done using a combination of round-robin (RR) decomposition and splitting communicators based on the output of the RR decomp. The key method here is:

struct gkyl_comm* gkyl_comm_create_comm_from_ranks(const struct gkyl_comm *comm, int nranks,
  const int *ranks, struct gkyl_rect_decomp *new_decomp,
  bool *is_valid);

This method is used to create a subset of communicators on which block Apps are constructed.

ammarhakim commented 2 months ago

I have updated the DR based on further work on July 5th 2024. At this point this DR is ready for review.