Closed akashukla closed 2 days ago
This overall structure is good IMO, especially if the prototype is working. I only have one suggestion: I don't see why the allgather is strictly necessary. The MB app knows where every rank goes, and should be able to do this in a loop over bcast with the appropriate ranges.
Something that is not explicitly stated here but may affect some of these decisions is the fact that our data is in row-major order. For this reason I think we had a step in g2 where we re-arranged things after the communication to account for this, specifically in this method of GkField
function GkField:AllgatherFieldZ(fldLocal, fldGlobal)
-- Gather a CartField distributed along z onto a global-in-z field.
fldLocal:copyRangeToBuffer(fldLocal:localRange(), self.localBuffer:data())
-- Gather flat buffers from other processes.
local mess = self.grid:getMessenger()
local comm = mess:getComms()["z"]
mess:Allgather(self.localBuffer, self.globalBufferZ, comm)
-- Rearrange into a global field, i.e. reconstruct the multi-D array
-- using the collection of flat buffers we gathered from each process.
for zIdx in self.zCutsRange:colMajorIter() do
fldGlobal:copyRangeFromBuffer(self.zDestRange[zIdx[1]],
self.globalBufferZ:data()+self.zBufferOffset[zIdx[1]])
end
end
I believe the same thing is possible, and should perhaps be done here, it's just that the ranges now are from other blocks and not just from other portions of a decomp.
We may however pursue this as a later optimization depending on current priorities.
I would like to see this DR presented so we can make suggestions on improving it.
Most important, as we discussed yesterday, I want to separately see a DR on the modifications to the comm infra. I do not want to see arbitrary changes to present comm without discussion. We need a proper, hierarchical system of comms (perhaps, not sure till one has thought about this some more).
I am closing this DR as I do not think this is the proper design. This needs a proper re-design, accounting for hierarchical communication, and respect the layering of 1 App per block. This design mixes ranks, communicators and blocks, hence confusing the issue on how communication should be done.
For example, you should compute charge density in each block. Then the parent, who knows the topology, should assemble the large rhoc
global array. This should be smoothed. At this point the smoothed rhoc
is already in a big global vector. Then the Poisson eqn should be solved. This will result in a giant phi
array that only the parent has yet access to. The parent, knowing the topology, appropriately then send each block its portion of phi.
For multi-block gyrokinetic simulations we need special considerations for the poisson solve. Our GK poisson solve has 2 steps: A smoothing in the parallel direction and then a perpendicular solve on each of the perpendicular planes.
For multi-block simulations, we will need to modify the solver to accommodate blocks connected in the parallel direction and also blocks connected in the perpendicular direction. As of now, this DR only considers blocks connected in the parallel direction.
In order to do the parallel smoothing, we need to collect the charge density from all of the blocks into a single array. Each block's charge density must be placed in the right location of this array based on the connectivity of the blocks.
In a multi-block simulation in which multiple blocks may be owned by a single rank or one block may contain multiple ranks.
Here is overview of the steps that need to be taken in the advance method:
The steps in the rhs method requires us to compute some things ahead of time at t=0:
Initial prototyping work has been done in gk-g0-app_mb_app_poisson: https://github.com/ammarhakim/gkylzero/tree/gk-g0-app_mb_app_poisson
This implementation of the field solver does not affect input files. The input for the field will be the same as it has been for single block apps.
Further Details/example code for the broadcasting step (step 4 in the rhs):