DARMA-tasking / vt

DARMA/vt => Virtual Transport
Other
35 stars 9 forks source link

BaseLB process-level stats have inconsistent sum and average loads before and after LB runs #1072

Closed PhilMiller closed 3 years ago

PhilMiller commented 4 years ago

From @nlslatt

Calling VT LB 
vt: [0] lb: BaseLB: Statistic=P_l:  max=3813.06, min=21.96, sum=63329.47, avg=494.76, var=118312.81, stdev=343.97, nproc=128, cardinality=128 skewness=7.03, kurtosis=64.40, npr=128, imb=6.71, num_stats=1
vt: [0] lb: BaseLB: Statistic=O_l:  max=0.67, min=0.00, sum=63.33, avg=0.04, var=0.00, stdev=0.05, nproc=1536, cardinality=1536 skewness=5.42, kurtosis=48.92, npr=1536, imb=15.31, num_stats=2
vt: [0] HierarchicalLB: loadStats: load=517.06, total=63329.47, avg=494.76, I=6.71,should_lb=true, auto=true, threshold=0.800000011920929
vt: [0] lb: BaseLB: Statistic=P_l:  max=921.62, min=23.35, sum=64228.31, avg=501.78, var=9136.33, stdev=95.58, nproc=128, cardinality=128 skewness=-0.59, kurtosis=11.75, npr=128, imb=0.84, num_stats=2
vt: [0] lb: BaseLB: Statistic=O_l:  max=0.67, min=0.00, sum=63.33, avg=0.04, var=0.00, stdev=0.05, nproc=1536, cardinality=1536 skewness=5.42, kurtosis=48.92, npr=1536, imb=15.31, num_stats=2
vt: [0] lb: BaseLB::finalize: LB total time=0.5440969467163086, total migration count=0
vt: [0] lb: LBManager::releaseNow: finished LB, phase=1, invocations=1

Describe the bug A clear and concise description of what the bug is.

To Reproduce Steps to reproduce the behavior:

  1. Example/test/snippet of code that fails
  2. Compiler, platform, libraries
  3. Run command: number of processors, threading options, etc.
  4. See error

Expected behavior A clear and concise description of what you expected to happen.

Screenshots If applicable, add screenshots to help explain your problem.

Platform (please complete the following information):

Additional context Add any other context about the problem here.

PhilMiller commented 4 years ago

The difference in computed processor stats is maybe attributable to lack of accounting for the loads of migrated objects in theNodeStats data structures

lifflander commented 3 years ago

Please post some way to reproduce this @nlslatt @PhilMiller

nlslatt commented 3 years ago

@lifflander I didn't realize that this issue existed when I created #1522 recently. #1522 has the details you're looking for. We can close this as a duplicate. The problem is not in BaseLB itself; HierarchicalLB and GreedyLB feed estimates to BaseLB for the post-LB stats, which results in a discrepancy.