Open nlslatt opened 3 years ago
This appears to be a long-standing problem. While I have not dug into this, I wonder if this is just a side-effect of HierarchicalLB and GreedyLB updating processor loads based on approximated loads (taken from the bin) instead of the load of the specific object being migrated.
This appears to be a long-standing problem. While I have not dug into this, I wonder if this is just a side-effect of HierarchicalLB and GreedyLB updating processor loads based on approximated loads (taken from the bin) instead of the load of the specific object being migrated.
From a brief glimpse at the code, it appears that the subtraction and addition of migrated loads are not symmetrical, which would lead to drift in the total and processor-average loads.
With #1583, we can ditch code in the strategies themselves that prints out updated statistics, since the framework now handles that, and does so using the full object data it would use to compute statistics pre-balancing.
@PhilMiller Can we close this now?
We may need to actually clean affected code out of the strategies, and verify the conclusion.
Describe the bug
HierarchicalLB
andGreedyLB
do not have the same processor load sum or average when comparing before and after LB. For example, compare thesum
on theP_l
lines that havenum_stats=1
(pre-LB) andnum_stats=2
(post-LB) below. Thesum
should never be different after LB than it was before LB on the same phase.The above was generated by running:
on my Mac with gcc, but the problem was first observed on an Intel build so does not appear to be compiler-specific.