[DR] Sync method for MB solver

The gyrokinetic (GK) multiblock (MB) app requires a sync method to transfer cons-space quantities and distributions between blocks. Fortunately previous work has been done to provide the connections (multib_comm_conn) needed to orchestrate this. So to utilize those connections we need to perform a set of steps outlined below.

Create the send/recv MB connections for each local block with gkyl_multib_comm_conn_send/recv_new.
These connections need to be modified in 3 ways.
- The rank in these connections is actually a rank index and needs to be translated to an actual rank ID (that can be given to send/recv).
- The range in these connections is not a sub range, and only contains conf-space dimensions. The range needs to be extended in velocity space if needed (e.g. to sync distributions) and/or needs to be made a sub_range.
- The connections need to be sorted to avoid deadlocks in NCCL. We sort them according to rank and block ID.
Given the list of send/recv connections, to sync an array we pass give those connections and the array to
```
int gkyl_multib_comm_conn_array_transfer(struct gkyl_comm *comm,
int num_blocks_local, const int *local_blocks,
struct gkyl_multib_comm_conn **mbcc_send, struct gkyl_multib_comm_conn **mbcc_recv,
struct gkyl_array **arr_send, struct gkyl_array **arr_recv);
```
which performs the sends and recvs necessary to sync the array, in a manner similar to the single block array sync. There are 3 implementations of this function, for null/mpi/nccl communicators and we select from each of those based on the id inside of the communicator. In the case of null_comm, we just perform local copies; for nccl we do something similar to nccl except that we perform local copies if the send/recv rank is the same as my rank.

In the case of conf-space arrays the above is all that is needed to sync. In the case of the distribution function what we actually have in the array is jacobgeo*f where jacobgeo is the conf-space Jacobian. What we'll actually do is, the following: suppose block0 and block1 are neighbors

Sync jacobgeo, so that block0 has jacobgeo1 in its ghost cells and block1 has jacobgeo0 in its ghost cells.
Sync jacobgeof, so that block0 has jacobgeo1f1 in its ghost cells and block1 has jacobgeo0*f0 in its ghost cells.
Weak divide ghost cells of jacobgeo*f by ghost cells of jacobgeo, so that block0 has f1 in its ghost cells and block1 has f0 in its ghost cells.
Multiply f in the ghost cell by the (boundary-dir-flipped) jacobgeo of this block, so that block0 has jacobgeo0f1 in its ghost cells and block1 has jacobgeo1f0 in its ghost cells. The boundary-dir-flipping in the last step is to ensure that the jacobians in (jacobgeof)_skin and (jacobgeof)_ghost have the same value on the skin-ghost boundary. Also, this last step will require a purpose-built updater which we may call dg_ghost_div_mult_op (or please recommend a better name).

The first set of operations (i.e. gkyl_multib_comm_conn_array_transfer) are being prototyped in the gk-g0-app_blocked_sol_comms branch.

Additional note:

The allgather needed to solve the parallel field solve in GK is not strictly part of this DR (see issue #503 ) but it will likely use the same gkyl_multib_comm_conn_array_transfer function, just with different connections and different input/output arrays.
Additional MB communications will likely be needed in MB, like e.g. to send info needed for the perpendicular field solve. If such info is communicated with gkyl_arrays then this same function may be used for that, otherwise we may add other communication methods to multib_comm_conn if they are comm_conn-based.

Can you say a little more about this operation: "The boundary-dir-flipping in the last step is to ensure that the jacobians in (jacobgeof)_skin and (jacobgeof)_ghost have the same value on the skin-ghost boundary."

My understanding of the problem is that because the normalization of the Jacobian is changing block to block (because we normalize by the arc length) we need to take into account the different Jacobians across block boundaries (this will be especially important for insuring the characteristics are continuous for computing fluxes in the skin-ghost regions of block boundaries).

But I'm trying to understand what the boundary-dir-flip is for; if I understand the operation you are flipping all the odd coefficients, but I can't quite visualize why flipping the odd coefficients of the Jacobian from the other block is the correct thing to do (instead of just taking the Jacobian from the other block and performing the desired weak operations).

Ok, let me see if I can explain it more clearly. Let's suppose we have 2 blocks, b0 and b1, and we want to fill in the ghost cells of b0.

We have the jacobian J as follows

 b0,skin        b0,ghost                b1,ghost        b1,skin
o---------------o---------------o       o---------------o---------------o
|               |               |       |               |               |
|  J_b0,skin    |               |       |               |    J_b1,skin  |
|               |               |       |               |               |
o---------------o---------------o       o---------------o---------------o

sync it to get

 b0,skin        b0,ghost                b1,ghost        b1,skin
o---------------o---------------o       o---------------o---------------o
|               |               |       |               |               |
|  J_b0,skin    |  J_b1,skin    |       |   J_b0,skin   |    J_b1,skin  |
|               |               |       |               |               |
o---------------o---------------o       o---------------o---------------o

On the other hand we have the distribution (times the jacobian as)

 b0,skin        b0,ghost                b1,ghost        b1,skin
o---------------o---------------o       o---------------o---------------o
|               |               |       |               |               |
|  Jf_b0,skin   |               |       |               |   Jf_b1,skin  |
|               |               |       |               |               |
o---------------o---------------o       o---------------o---------------o

sync it to get

 b0,skin        b0,ghost                b1,ghost        b1,skin
o---------------o---------------o       o---------------o---------------o
|               |               |       |               |               |
|  Jf_b0,skin   |  Jf_b1,skin   |       |  Jf_b0,skin   |   Jf_b1,skin  |
|               |               |       |               |               |
o---------------o---------------o       o---------------o---------------o

Now divide the ghost cells of Jf by the ghost cells of J to get

 b0,skin        b0,ghost                b1,ghost        b1,skin
o---------------o---------------o       o---------------o---------------o
|               |               |       |               |               |
|  Jf_b0,skin   |   f_b1,skin   |       |   f_b0,skin   |  Jf_b1,skin   |
|               |               |       |               |               |
o---------------o---------------o       o---------------o---------------o

If you simply multiply the ghost cells by the skin Jacobian in that block you could get a huge jump at the skin-ghost surface that would be incorrect (e.g. think of the case in which J linearly increases towards the boundary). So what we in fact do is produce

J_b#,skin,fl = basis.flipp_odd_sign(J_b#,skin)

and then multiply the ghost cells so we end up with

 b0,skin        b0,ghost                      b1,ghost                    b1,skin
o---------------o-------------------------o    o-------------------------o---------------o
|               |                         |    |                         |               |
|  Jf_b0,skin   |  J_b0,skin,fl*f_b1,skin |    |  J_b1,skin,fl*f_b0,skin |   Jf_b1,skin  |
|               |                         |    |                         |               |
o---------------o-------------------------o    o-------------------------o---------------o

Hope that answers your comment/clarifies things.

ammarhakim / gkylzero

[DR] Sync method for MB solver #510