DARMA-tasking / vt

DARMA/vt => Virtual Transport
Other
35 stars 9 forks source link

HierarchicalLB and GreedyLB appear to gain load #1522

Open nlslatt opened 3 years ago

nlslatt commented 3 years ago

Describe the bug HierarchicalLB and GreedyLB do not have the same processor load sum or average when comparing before and after LB. For example, compare the sum on the P_l lines that have num_stats=1 (pre-LB) and num_stats=2 (post-LB) below. The sum should never be different after LB than it was before LB on the same phase.

vt: [0] (t) lb: BaseLB: Statistic=P_l:  max=106.03, min=0.30, sum=213.33, avg=26.67, var=2080.34, stdev=45.61, nproc=8, cardinality=8 skewness=0.95, kurtosis=-1.21, npr=8, imb=2.98, num_stats=1
vt: [0] (t) lb: BaseLB: Statistic=O_l:  max=0.03, min=0.00, sum=0.21, avg=0.01, var=0.00, stdev=0.01, nproc=32, cardinality=32 skewness=1.10, kurtosis=-0.80, npr=32, imb=3.11, num_stats=2
vt: [0] (t) HierarchicalLB: loadStats: load=105.30, total=213.33, avg=26.67, I=2.98,should_lb=true, auto=true, threshold=0.800000011920929
vt: [0] (t) lb: BaseLB: Statistic=P_l:  max=30.40, min=30.00, sum=242.00, avg=30.25, var=0.02, stdev=0.15, nproc=8, cardinality=8 skewness=-0.77, kurtosis=-1.31, npr=8, imb=0.00, num_stats=2
vt: [0] (t) lb: BaseLB: Statistic=O_l:  max=0.03, min=0.00, sum=0.21, avg=0.01, var=0.00, stdev=0.01, nproc=32, cardinality=32 skewness=1.10, kurtosis=-0.80, npr=32, imb=3.11, num_stats=2
vt: [0] (t) lb: BaseLB::finalize: LB total time=0.009775000000000006, total migration count=7
vt: [0] (t) lb: LBManager::finishedLB, phase=1

The above was generated by running:

mpirun -np 8 ./examples/collection/lb_iter 32 1 2 --vt_lb_name=HierarchicalLB --vt_lb_interval=1 --vt_lb --vt_no_color

on my Mac with gcc, but the problem was first observed on an Intel build so does not appear to be compiler-specific.

nlslatt commented 3 years ago

This appears to be a long-standing problem. While I have not dug into this, I wonder if this is just a side-effect of HierarchicalLB and GreedyLB updating processor loads based on approximated loads (taken from the bin) instead of the load of the specific object being migrated.

nlslatt commented 3 years ago

This appears to be a long-standing problem. While I have not dug into this, I wonder if this is just a side-effect of HierarchicalLB and GreedyLB updating processor loads based on approximated loads (taken from the bin) instead of the load of the specific object being migrated.

From a brief glimpse at the code, it appears that the subtraction and addition of migrated loads are not symmetrical, which would lead to drift in the total and processor-average loads.

PhilMiller commented 2 years ago

With #1583, we can ditch code in the strategies themselves that prints out updated statistics, since the framework now handles that, and does so using the full object data it would use to compute statistics pre-balancing.

lifflander commented 2 years ago

@PhilMiller Can we close this now?

PhilMiller commented 2 years ago

We may need to actually clean affected code out of the strategies, and verify the conclusion.