The atmospheric physics load balancing attempts to assign columns to chunks in such a way that each chunk has approximately the same computational cost. This way, all that is then needed is to assign the same number of chunks to each computational thread. The current scheme has an option to assign columns to chunks in pairs, where each pair matches day/night, northern/southern hemisphere "twins" (and often land/ocean), but beyond this is just a wrap map of columns to chunks. This has worked pretty well, for a long time. (The twin option works great for lat-lon grids, but is a wash for the cubed sphere grid, so we are usually just using the wrap map with the SE dycore.)
This wrap map scheme is unlikely to be as sufficient when using superparameterization or elevation classes, and, in both of these cases, the cost of each column can be estimated (number of subcolumns or number of elevation classes). The load balancing scheme should be generatlized to use estimated column costs, when available, when constructing these load balanced chunks (but leaving the current scheme unchanged when estimated column costs are not available).
It would also be useful to capture and record the computational cost per chunk empirically, to verify the accuracy of the load balancing scheme.
The atmospheric physics load balancing attempts to assign columns to chunks in such a way that each chunk has approximately the same computational cost. This way, all that is then needed is to assign the same number of chunks to each computational thread. The current scheme has an option to assign columns to chunks in pairs, where each pair matches day/night, northern/southern hemisphere "twins" (and often land/ocean), but beyond this is just a wrap map of columns to chunks. This has worked pretty well, for a long time. (The twin option works great for lat-lon grids, but is a wash for the cubed sphere grid, so we are usually just using the wrap map with the SE dycore.)
This wrap map scheme is unlikely to be as sufficient when using superparameterization or elevation classes, and, in both of these cases, the cost of each column can be estimated (number of subcolumns or number of elevation classes). The load balancing scheme should be generatlized to use estimated column costs, when available, when constructing these load balanced chunks (but leaving the current scheme unchanged when estimated column costs are not available).
It would also be useful to capture and record the computational cost per chunk empirically, to verify the accuracy of the load balancing scheme.