without parallelization within the grids / without process groups
To do it with process groups, one could "vote" on each process groups which subspaces are present. A set of process groups that shares the same subspaces (and all others don't) could have its own reduce-communicator. the reduction would iterate all reduce-communicators.
A challenge we have already identified:
The (potential) number of communicators can become excessively large: O(Process group size * 2^(number of subspaces)) . Process group size can be up to 2^13- 2^14 for current scenarios. number of subspaces can be in the 100,000s, and 2^ comes from the power set.
-> we are not sure if created communicators use memory on the ranks they contain only, or if the info is collected globally somewhere. how do implementations do it?
-> could maybe be remedied by a good scenario splitting, where partitions = process groups in https://github.com/SGpp/DisCoTec-combischeme-utilities . then, there should be many groups that share the exact same set of subspaces.
-> there could be a trade-off between sparse grid reduce and subspace reduce (if only some subspaces are allocated in addition to the ones that are strictly required)
to save memory, we could move from sparse grid reduce to subspace reduce, cf. https://ebooks.iospress.nl/doi/10.3233/978-1-61499-381-0-564
this was implemented in the past for
To do it with process groups, one could "vote" on each process groups which subspaces are present. A set of process groups that shares the same subspaces (and all others don't) could have its own reduce-communicator. the reduction would iterate all reduce-communicators.
A challenge we have already identified:
The (potential) number of communicators can become excessively large: O(Process group size * 2^(number of subspaces)) . Process group size can be up to 2^13- 2^14 for current scenarios. number of subspaces can be in the 100,000s, and 2^ comes from the power set.
-> we are not sure if created communicators use memory on the ranks they contain only, or if the info is collected globally somewhere. how do implementations do it?
-> could maybe be remedied by a good scenario splitting, where partitions = process groups in https://github.com/SGpp/DisCoTec-combischeme-utilities . then, there should be many groups that share the exact same set of subspaces.
-> there could be a trade-off between sparse grid reduce and subspace reduce (if only some subspaces are allocated in addition to the ones that are strictly required)