This PR slightly changes the way we split a large Ensemble into multiple smaller Ensembles, either in the SliceJacobiPC or in the nonlinear Gauss-Seidel iterations.
firedrake.Ensemble essentially has two responsibilities:
Make sure we have three comms on each rank, a global_comm, a spatial_comm, and an ensemble_comm. Currently, firedrake.Ensemble does this by taking a global_comm and splitting it into a Cartesian product of comms, providing a spatial_comm and an ensemble_comm from this product.
Wrap mpi4py calls so that we can send firedrake.Functions from one spatial_comm to another across the ensemble_comm with a simple API and some sanity checks.
Creating a split ensemble involves intercepting the logic in 1 to make sure the comms in the split ensemble relate properly to the comms in the larger ensemble. Specifically, the split ensemble needs three communicators:
A global_comm split from the global_comm of the larger ensemble
An ensemble_comm split from the global_comm or ensemble_comm of the larger ensemble
A spatial_comm that is the same as the spatial_comm from the larger ensemble so we can use the same mesh with both Ensembles.
The main issue here is that we need the spatial_comm of the split ensemble to be the same comm as the spatial_comm of the large ensemble, not just congruent. This means that we can't just make the smaller global_comm for the split ensemble and reuse the existing firedrake.Ensemble.__init__.
Previously I made a new EnsembleConnector class (terrible name, it connects existing spatial_comms, it doesn't connect different Ensembles). This class inherited from firedrake.Ensemble but overrode __init__, taking a global_comm and a specific spatial_comm, then created a new ensemble_comm by splitting the provided global_comm. To go with this is a split_ensemble function that takes in a large ensemble, splits it's global_comm, and passes the split global_comm and the spatial_comm to the new EnsembleConnector.
This works fine for our case but has a couple of issues (other than the naming issues that already plague Ensemble).
The main issue is that it mixes up responsibilities. The split_ensemble function does some, but not all of task 1, making sure we have three viable comms. It sorts out the global_comm and spatial_comm but not the ensemble_comm, which is left to the EnsembleConnector.
It is specific to the case of already having the spatial_comm but not the ensemble_comm (what about the case where I already have the ensemble_comm but not the spatial_comm, or already have both).
This PR changes to having a ManualEnsemble class that inherits from firedrake.Ensemble but just takes three comms and checks that they look like a global/spatial/ensemble comm set (i.e. they look like a cartesian product of comms). It essentially is only doing task 2, wrapping mpi4py calls, and trusts that the three provided comms are a valid set to use.
The split_ensemble function now does all of task 1, taking in a larger Ensemble, splitting the global_command the ensemble_comm, and passing these plus the original spatial_comm to ManualEnsemble.
ManualEnsemble is simpler and more general than EnsembleConnector was (also more of a footgun, but I've tried to add enough checks), and split_ensemble now takes care of all of the logic of splitting an Ensemble, rather than just some of it.
@JDBetteridge this PR only involves the split_ensemble function and the ManualEnsemble class, the create_ensemble is just a convenience function not used here.
This PR slightly changes the way we split a large
Ensemble
into multiple smallerEnsembles
, either in theSliceJacobiPC
or in the nonlinear Gauss-Seidel iterations.firedrake.Ensemble
essentially has two responsibilities:global_comm
, aspatial_comm
, and anensemble_comm
. Currently,firedrake.Ensemble
does this by taking aglobal_comm
and splitting it into a Cartesian product of comms, providing aspatial_comm
and anensemble_comm
from this product.mpi4py
calls so that we can sendfiredrake.Functions
from onespatial_comm
to another across theensemble_comm
with a simple API and some sanity checks.Creating a split ensemble involves intercepting the logic in 1 to make sure the comms in the split ensemble relate properly to the comms in the larger ensemble. Specifically, the split ensemble needs three communicators:
global_comm
split from theglobal_comm
of the larger ensembleensemble_comm
split from theglobal_comm
orensemble_comm
of the larger ensemblespatial_comm
that is the same as thespatial_comm
from the larger ensemble so we can use the same mesh with both Ensembles.The main issue here is that we need the
spatial_comm
of the split ensemble to be the same comm as thespatial_comm
of the large ensemble, not just congruent. This means that we can't just make the smaller global_comm for the split ensemble and reuse the existingfiredrake.Ensemble.__init__
.Previously I made a new
EnsembleConnector
class (terrible name, it connects existingspatial_comms
, it doesn't connect differentEnsembles
). This class inherited fromfiredrake.Ensemble
but overrode__init__
, taking aglobal_comm
and a specificspatial_comm
, then created a newensemble_comm
by splitting the providedglobal_comm
. To go with this is asplit_ensemble
function that takes in a large ensemble, splits it'sglobal_comm
, and passes the splitglobal_comm
and thespatial_comm
to the newEnsembleConnector
. This works fine for our case but has a couple of issues (other than the naming issues that already plagueEnsemble
).split_ensemble
function does some, but not all of task 1, making sure we have three viable comms. It sorts out theglobal_comm
andspatial_comm
but not theensemble_comm
, which is left to theEnsembleConnector
.spatial_comm
but not theensemble_comm
(what about the case where I already have theensemble_comm
but not thespatial_comm
, or already have both).This PR changes to having a
ManualEnsemble
class that inherits fromfiredrake.Ensemble
but just takes three comms and checks that they look like a global/spatial/ensemble comm set (i.e. they look like a cartesian product of comms). It essentially is only doing task 2, wrappingmpi4py
calls, and trusts that the three provided comms are a valid set to use. Thesplit_ensemble
function now does all of task 1, taking in a largerEnsemble
, splitting theglobal_comm
and theensemble_comm
, and passing these plus the originalspatial_comm
toManualEnsemble
.ManualEnsemble
is simpler and more general thanEnsembleConnector
was (also more of a footgun, but I've tried to add enough checks), andsplit_ensemble
now takes care of all of the logic of splitting anEnsemble
, rather than just some of it.