This PR adds a possibility to request optional DCCRG ranks that do not contain any cells, and should be nearly free of any overhead by Vlasiator Vlasov Solver. Such ranks that do not contain any cells, still create a valid DCCRG object to minimize the changes required in Vlasiator.
This PR is fully backwards compatible, meaning that DCCRG interface does not change, and no changes to the used Vlasiator versions are required if the new DCCRG-split feature is not used. All existing features should work without need for any changes. The PR passes all Vlasiator testpackage tests when not using DCCRG-split feature, and when using DCCRG-split feature, all tests except the ionosphere are passed (DCCRG-split feature does not currently support ionosphere calculation).
However, in order to maintain backwards compatibility in this pre-release, the DCCRG-split feature is, for the time being (while not in production use yet), activated by setting a runtime environment variable "DCCRG_PROCS" to indicate the number of requested DCCRG-ranks that contain cells, eg,
export DCCRG_PROCS=12
The value is used only if it is an integer greater than zero, and smaller than the number of ranks in the communicator passed to the DCCRG during initialization. For example, if the MPI_COMM_WORLD communicator contains 16 ranks, and this communicator is passed to the DCCRG, while DCCRG_PROCS=12 is set, then the DCCRG will configure Zoltan such that the load is balanced only across the global ranks 4 - 15. The remaining 4 global ranks (0, 1, 2, and 3) containing no DCCRG cells will still return valid DCCRG objects.
The DCCRG implementation uses the Zoltan function "Zoltan_LB_Partition()" instead of "Zoltan_LB_Balance()" with the setting "NUM_LOCAL_PARTS = 0" for those processes that should contain no DCCRG cells. After this , Zoltan takes care of balancing the load such that no cells are assigned to the processes which have set "NUM_LOCAL_PARTS = 0".
Furthermore, in order to use the new DCCRG-split feature in Vlasiator, a couple of minor changes are required in the Vlasiator dev-branch (same changes likely suffice for other branches as well) as introduced by this PR.
Note!
This PR completes the second stage of making Vlasiator Vlasov and Field Solver run in separate ranks. For example, if one wants to run Field Solver on ranks 0 - 3, and Vlasov Solver on ranks 4 - 15, one should launch Vlasiator with 16 MPI processes and with the following runtime environment variables:
export DCCRG_PROCS=12
export FSGRID_PROCS=4
For example, below is a snapshot of the process resource usage while running Vlasiator Magnetosphere_3D_small with 16 MPI ranks, and the above environment variables set:
Note 2!
However, it is not required that there is no overlap between the ranks running Vlasov and Field solver. Also, having empty ranks not running either solver should function correctly. Ie, when launching 16 MPI processes, the following settings are valid and should produce correct results:
This PR adds a possibility to request optional DCCRG ranks that do not contain any cells, and should be nearly free of any overhead by Vlasiator Vlasov Solver. Such ranks that do not contain any cells, still create a valid DCCRG object to minimize the changes required in Vlasiator.
This PR is fully backwards compatible, meaning that DCCRG interface does not change, and no changes to the used Vlasiator versions are required if the new DCCRG-split feature is not used. All existing features should work without need for any changes. The PR passes all Vlasiator testpackage tests when not using DCCRG-split feature, and when using DCCRG-split feature, all tests except the ionosphere are passed (DCCRG-split feature does not currently support ionosphere calculation).
However, in order to maintain backwards compatibility in this pre-release, the DCCRG-split feature is, for the time being (while not in production use yet), activated by setting a runtime environment variable "DCCRG_PROCS" to indicate the number of requested DCCRG-ranks that contain cells, eg,
The value is used only if it is an integer greater than zero, and smaller than the number of ranks in the communicator passed to the DCCRG during initialization. For example, if the MPI_COMM_WORLD communicator contains 16 ranks, and this communicator is passed to the DCCRG, while DCCRG_PROCS=12 is set, then the DCCRG will configure Zoltan such that the load is balanced only across the global ranks 4 - 15. The remaining 4 global ranks (0, 1, 2, and 3) containing no DCCRG cells will still return valid DCCRG objects.
The DCCRG implementation uses the Zoltan function "Zoltan_LB_Partition()" instead of "Zoltan_LB_Balance()" with the setting "NUM_LOCAL_PARTS = 0" for those processes that should contain no DCCRG cells. After this , Zoltan takes care of balancing the load such that no cells are assigned to the processes which have set "NUM_LOCAL_PARTS = 0".
Furthermore, in order to use the new DCCRG-split feature in Vlasiator, a couple of minor changes are required in the Vlasiator dev-branch (same changes likely suffice for other branches as well) as introduced by this PR.
Note! This PR completes the second stage of making Vlasiator Vlasov and Field Solver run in separate ranks. For example, if one wants to run Field Solver on ranks 0 - 3, and Vlasov Solver on ranks 4 - 15, one should launch Vlasiator with 16 MPI processes and with the following runtime environment variables:
For example, below is a snapshot of the process resource usage while running Vlasiator Magnetosphere_3D_small with 16 MPI ranks, and the above environment variables set:
Note 2! However, it is not required that there is no overlap between the ranks running Vlasov and Field solver. Also, having empty ranks not running either solver should function correctly. Ie, when launching 16 MPI processes, the following settings are valid and should produce correct results:
and