Stochastic weight generation at runtime

TLDR:

For a stochastic weights selection is it reasonable to rearrange the weights and source AV separately to be able to have the multiplicands x' and wgt in local memory where y is computed?
Why is nproc = 0 when calling the rearranger for sMatPlus%YPrimeToY? We never seem to pick up the data exchanged in m_swapm_FP from the RRecvBuf

The underlying problem:

Say we couple two models through OASIS-MCT, one of which has a much higher resolution than the other. In my example a 100 km atmosphere coupled to a 10 km ocean model. Here is a graphic from a paper Rackow and Juricke 2020 showing such a situation:

Ordinarily we would calculate remapping weights between these two grids via e.g. SCRIP, YAC or ESMF. Typically we make this remapping calculation only once, at the beginning of the experiment, or even reuse old weights from a pool directory.

The 10km ocean model has some spatial variability and some temporal variability. How well can this information, present on the ocean gridscale, and thus for the 100km atmosphere quasi-subgridscale, be transferred through the interpolation? MCT provides the y = Mx sparse matrix multiplication functionality where y is the target vector (atm grid), M is the weight vector and x is the source (ocean grid) vector.

If we use 1st order conservative remapping y_i = Σ M_k * x_k, where each weight is based on the area fraction: How much of the atm cell is covered by the oce cell, which is ~1/100 in our example. But can vary with unstructured ocean meshes. Coming back to the question of maintaining the temporal and spatial variability, 1st order conserve does a poor job. We make a mean over ~100 ocean cells. If there is an eddy in there, it's now gone. On the temporal variability side we also don't preserve anything that is not the large scale dynamics.
If we use a nearest neighbor, we keep the full temporal variability of one ocean surface point. If an eddy front moves through, the atmosphere would see a reasonably quick gradient. However we loose all spatial information of the other 99 ocean surface points. If we are near a coastline with quasi stationary ocean features, these never arrive on the atmosphere side, which will lead to systematic errors.
If we use the nearest n weighted neighbors, we have some mix of the two above. The more the use all the spatial information, the worse our temporal variability gets.
Probably the most physical solution would be to calculate the variability on the ocean grid, couple them in addition to the mean values and modify the atmospheric physics to use this information. This would be bespoke for every model and does not seem trivial to me.

The idea:

What if instead of using precomputed weights, we stochastically create the weights at runtime in MCT? We can still use something like area fraction or distance weights as the basis of the probability, all we need to do is to replace the weights inside,

https://github.com/MCSclimate/MCT/blob/e36024c5ddf482625ae6bd9474eff7d8f393f87c/mct/m_MatAttrVectMul.F90#L267

right? Well unfortunately not. This operation needs all the data in local memory. But most of the time when using MCT we use the XandY distributed sparse matrix multiplication. According to a precomputed communication pattern, we first load into local memory all the data needed to calculate x' = My', https://github.com/MCSclimate/MCT/blob/e36024c5ddf482625ae6bd9474eff7d8f393f87c/mct/m_MatAttrVectMul.F90#L595-L598 Then we call the data local sparse matrix multiplication: https://github.com/MCSclimate/MCT/blob/e36024c5ddf482625ae6bd9474eff7d8f393f87c/mct/m_MatAttrVectMul.F90#L608-L609 Which contains the line 267. And only then do we compute the sum of y' that makes up the final y: https://github.com/MCSclimate/MCT/blob/e36024c5ddf482625ae6bd9474eff7d8f393f87c/mct/m_MatAttrVectMul.F90#L620-L622

Questions:

Question 1:

So, if I want to replace the weights M that make up a final y_i, such that instead of weighting each source point by area fraction, with a stochastic selection that uses the area fraction as the probability, I need to rearrange the multiplicands x' and M rather than their products y' onto into the memory where otherwise y would be assembled from y'. Then we make the sparse matrix multiplication on the y local memory. This is more communication expensive, because we communicate twice as many values, but this could be justifiable, if the coupled variability modes improve as we expect. I spend only a few days with the MCT code. Does this make sense from the point of someone much more experienced with MCT, or is there some obvious flaw in the thinking?

Question 2:

Finally the question that I am stuck on right now. When preparing to implement this, I did the old "add a bunch of print statements" to the rearranger. When we rearrange X for XPrime, RecvRout%nprocs: 8, for my example coupling.

48:  #####################
48:  # sMatAvMult_SMPlus #
48:  #####################
48:  strat: XandY
48:  Calling rearranger for XPrime
48:    ==================
48:    =   Rearranger   =
48:    ==================
48:    DoSum:  F
48:    DoStoch:  T
48:    useswapm:  T
48:    usealltoall:  T
48:    RecvRout%nprocs:            8
48:    max_pe:           99
48:    numr:            2
48:    SendRout%nprocs          13
48:    here we send all to all to exchange send and recv buffers
48:    RecvRout%nprocs:            8
48:    RecvRout%num_segs(proc)          14
48:    seg_start, seg_end:            1          19

this makes sense to me, and with that nproc we call the loop that fills the final x' with the buffered x values:

https://github.com/MCSclimate/MCT/blob/e36024c5ddf482625ae6bd9474eff7d8f393f87c/mct/m_Rearranger.F90#L1275-L1287

However when rearranging calling the rearranger for y' to y, which is where I want to make my modifications:

48:  Calling rearranger for Y
48:    ==================
48:    =   Rearranger   =
48:    ==================
48:    DoSum:  T
48:    DoStoch:  T
48:    useswapm:  T
48:    usealltoall:  T
48:    RecvRout%nprocs:            0
48:    max_pe:           99
48:    numi:            0
48:    numr:            2
48:    SendRout%nprocs           0
48:    here we send all to all to exchange send and recv buffers
48:    RecvRout%nprocs:            0
48:    ==================

nprocs is 0 for send and recv. So thing is actually done. How do we ever get the final y from y'?

MCSclimate / MCT