Add DCCRG-split feature (completion of solver split stage 2/2)

This PR adds a possibility to request optional DCCRG ranks that do not contain any cells, and should be nearly free of any overhead by Vlasiator Vlasov Solver. Such ranks that do not contain any cells, still create a valid DCCRG object to minimize the changes required in Vlasiator.

This PR is fully backwards compatible, meaning that DCCRG interface does not change, and no changes to the used Vlasiator versions are required if the new DCCRG-split feature is not used. All existing features should work without need for any changes. The PR passes all Vlasiator testpackage tests when not using DCCRG-split feature, and when using DCCRG-split feature, all tests except the ionosphere are passed (DCCRG-split feature does not currently support ionosphere calculation).

However, in order to maintain backwards compatibility in this pre-release, the DCCRG-split feature is, for the time being (while not in production use yet), activated by setting a runtime environment variable "DCCRG_PROCS" to indicate the number of requested DCCRG-ranks that contain cells, eg,

export DCCRG_PROCS=12

The value is used only if it is an integer greater than zero, and smaller than the number of ranks in the communicator passed to the DCCRG during initialization. For example, if the MPI_COMM_WORLD communicator contains 16 ranks, and this communicator is passed to the DCCRG, while DCCRG_PROCS=12 is set, then the DCCRG will configure Zoltan such that the load is balanced only across the global ranks 4 - 15. The remaining 4 global ranks (0, 1, 2, and 3) containing no DCCRG cells will still return valid DCCRG objects.

The DCCRG implementation uses the Zoltan function "Zoltan_LB_Partition()" instead of "Zoltan_LB_Balance()" with the setting "NUM_LOCAL_PARTS = 0" for those processes that should contain no DCCRG cells. After this , Zoltan takes care of balancing the load such that no cells are assigned to the processes which have set "NUM_LOCAL_PARTS = 0".

Furthermore, in order to use the new DCCRG-split feature in Vlasiator, a couple of minor changes are required in the Vlasiator dev-branch (same changes likely suffice for other branches as well) as introduced by this PR.

Note! This PR completes the second stage of making Vlasiator Vlasov and Field Solver run in separate ranks. For example, if one wants to run Field Solver on ranks 0 - 3, and Vlasov Solver on ranks 4 - 15, one should launch Vlasiator with 16 MPI processes and with the following runtime environment variables:

export DCCRG_PROCS=12
export FSGRID_PROCS=4

For example, below is a snapshot of the process resource usage while running Vlasiator Magnetosphere_3D_small with 16 MPI ranks, and the above environment variables set:

PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
114319 jarohokk  20   0 21,182g 1,790g  25924 R 864,7 0,713   9:42.43 vlasiator                                             
114318 jarohokk  20   0 19,690g 2,109g  26396 R 687,5 0,840  10:03.40 vlasiator                                             
114317 jarohokk  20   0 20,686g 1,773g  26176 S 692,7 0,706  10:51.53 vlasiator                                             
114316 jarohokk  20   0 23,494g 1,747g  26388 R 710,2 0,696  11:00.78 vlasiator                                             
114315 jarohokk  20   0 20,728g 1,922g  26192 S 837,6 0,765   9:39.62 vlasiator                                             
114314 jarohokk  20   0 20,179g 1,753g  26196 R 608,6 0,698  10:29.74 vlasiator                                             
114313 jarohokk  20   0 19,735g 1,772g  26128 S 832,7 0,705   9:53.14 vlasiator                                             
114312 jarohokk  20   0 22,511g 1,751g  26248 R 618,5 0,697  10:15.05 vlasiator                                             
114311 jarohokk  20   0 19,370g 1,916g  25660 R 587,8 0,763   9:43.38 vlasiator                                             
114310 jarohokk  20   0 20,134g 1,843g  26628 R 657,8 0,734  10:11.78 vlasiator                                             
114309 jarohokk  20   0 16,747g 1,936g  25884 R 592,1 0,771   9:39.76 vlasiator                                             
114308 jarohokk  20   0 17,576g 1,807g  25868 R 820,8 0,719   9:17.68 vlasiator                                             
114307 jarohokk  20   0 1663588 109352  23628 R 138,9 0,042   2:15.15 vlasiator                                             
114306 jarohokk  20   0 1857844 106256  24196 R 139,3 0,040   2:18.28 vlasiator                                             
114305 jarohokk  20   0 2112432 101552  24356 R 139,3 0,039   2:16.07 vlasiator                                             
114304 jarohokk  20   0 2151620  98916  26332 R 138,3 0,038   2:15.78 vlasiator

Note 2! However, it is not required that there is no overlap between the ranks running Vlasov and Field solver. Also, having empty ranks not running either solver should function correctly. Ie, when launching 16 MPI processes, the following settings are valid and should produce correct results:

export DCCRG_PROCS=7
export FSGRID_PROCS=14

and

export DCCRG_PROCS=9
export FSGRID_PROCS=4

fmihpc / dccrg

Add DCCRG-split feature (completion of solver split stage 2/2) #30