DARMA-tasking / vt

DARMA/vt => Virtual Transport
Other
35 stars 9 forks source link

`test_load_balancer_keep_last_elm` test fails occasionally #1395

Closed cz4rs closed 3 years ago

cz4rs commented 3 years ago

Describe the bug vt:*/TestLoadBalancer.test_load_balancer_keep_last_elm/*_proc_2 test fails randomly in CI.

This can be reproduced in local using: ctest -R TestLoadBalancer.test_load_balancer_keep_last_elm --output-on-failure --repeat-until-fail 100

Assertion:

vt: [1] ------------------------------------------------------------------------------------------------------------------------
vt: [1] ------------------------------------------- Runtime Error: System Aborting! --------------------------------------------
vt: [1] ------------------------------------------------ Fatal Error on Node 1 -------------------------------------------------
vt: [1] ------------------------------------------------------------------------------------------------------------------------
vt: [1] 
vt: [1]              Reason: Must have object
vt: [1]    Assertion failed: (theNodeStats()->hasObjectToMigrate(obj_id))
vt: [1]                Node: 1
vt: [1]           Num Nodes: 4
vt: [1]                File: ../src/vt/vrt/collection/balance/baselb/baselb.cc
vt: [1]                Line: 228
vt: [1]            Function: transferMigrations

Backtrace:

vt: [1] ------------------------------------------------------------------------------------------------------------------------
vt: [1] -------------------------------------------- Dump Stack Backtrace on Node 1 --------------------------------------------
vt: [1] ------------------------------------------------------------------------------------------------------------------------
vt: [1] 0   18  0x1c0ed5e     vt::debug::stack::dumpStack[abi:cxx11](int) + 46
vt: [1] 1   18  0x18fd6db     vt::runtime::Runtime::output(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, int, bool, bool, bool) + 1771
vt: [1] 2   18  0x1850980     vt::CollectiveAnyOps<(vt::runtime::eRuntimeInstance)0>::output(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, int, bool, bool, bool, bool) + 288
vt: [1] 3   18  0x1850377     vt::output(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, int, bool, bool, bool, bool) + 199
vt: [1] 4   18  0xd42a50      _ZN2vt5debug6assert9assertOutIJEEENSt9enable_ifIXeqsr3std10tuple_sizeISt5tupleIJDpT_EEEE5valueLi0EEvE4typeEbNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEERKSF_SH_iSH_iOS7_ + 352
vt: [1] 5   18  0x1d19d61     vt::vrt::collection::lb::BaseLB::transferMigrations(vt::vrt::collection::lb::TransferMsg<std::vector<std::tuple<vt::vrt::collection::balance::ElementIDStruct, short>, std::allocator<std::tuple<vt::vrt::collection::balance::ElementIDStruct, short> > > >*) + 561
vt: [1] 6   18  0x1a8011b     vt::objgroup::dispatch::Dispatch<vt::vrt::collection::lb::BaseLB>::run(long, vt::messaging::BaseMsg*) + 571
vt: [1] 7   18  0x1a9b5cc     vt::objgroup::ObjGroupManager::dispatch(vt::messaging::MsgSharedPtr<vt::messaging::ActiveMsg<vt::messaging::ActiveEnvelope> >, long) + 380
vt: [1] 8   18  0x1a9c02e     vt::objgroup::dispatchObjGroup(vt::messaging::MsgSharedPtr<vt::messaging::ActiveMsg<vt::messaging::ActiveEnvelope> >, long) + 78
vt: [1] 9   18  0x1bb469d     ./collection_extended() [0x1bb469d] + 0
vt: [1] 10  18  0x1bb4383     vt::runnable::RunnableNew::run() + 915
vt: [1] 11  18  0x1e20ddf     vt::sched::BaseUnit::execute() + 31
vt: [1] 12  18  0x1bc6828     vt::sched::Scheduler::runWorkUnit(vt::sched::PriorityUnit&) + 584
vt: [1] 13  18  0x1bc6ec0     vt::sched::Scheduler::runSchedulerOnceImpl(bool) + 512
vt: [1] 14  18  0x1bc739a     vt::sched::Scheduler::runSchedulerWhile(std::function<bool ()>) + 90
vt: [1] 15  18  0x1bc7865     vt::runSchedulerThrough(unsigned long) + 101
vt: [1] 16  18  0x1a6ddfd     vt::vrt::collection::balance::LBManager::runLB(vt::objgroup::proxy::Proxy<vt::vrt::collection::lb::BaseLB>, unsigned long) + 653
vt: [1] 17  18  0x1a6e3c1     vt::vrt::collection::balance::LBManager::startLB(unsigned long, vt::vrt::collection::balance::LBType) + 1361
vt: [1] 18  18  0x18e1a31     vt::phase::PhaseManager::runHooks(vt::phase::PhaseHook) + 737
vt: [1] 19  18  0x18e1f1d     vt::phase::PhaseManager::nextPhaseCollective() + 973
vt: [1] 20  18  0x154b004     vt::tests::unit::TestLoadBalancer::runTest() + 948
vt: [1] 21  18  0x1844254     void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) + 68
vt: [1] 22  18  0x18210c8     testing::Test::Run() + 488
vt: [1] 23  18  0x1822372     testing::TestInfo::Run() + 546
vt: [1] 24  18  0x1822ac7     testing::TestSuite::Run() + 247
vt: [1] 25  18  0x1830a17     testing::internal::UnitTestImpl::RunAllTests() + 1351
vt: [1] 26  18  0x1844fa4     bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) + 68
vt: [1] 27  18  0x183049c     testing::UnitTest::Run() + 76
vt: [1] 28  18  0x7fd9049d80b3 __libc_start_main + 243
vt: [1] 29  18  0xd3783e      _start + 46
full output

``` Test #280: vt:*/TestLoadBalancer.test_load_balancer_keep_last_elm/*_proc_4 ...***Failed Required regular expression not found. Regex=[PASSED ] 0.25 sec Note: Google Test filter = */TestLoadBalancer.test_load_balancer_keep_last_elm/* [==========] Running 5 tests from 1 test suite. [----------] Global test environment set-up. [----------] 5 tests from LoadBalancerExplode/TestLoadBalancer [ RUN ] LoadBalancerExplode/TestLoadBalancer.test_load_balancer_keep_last_elm/0 Note: Google Test filter = */TestLoadBalancer.test_load_balancer_keep_last_elm/* [==========] Running 5 tests from 1 test suite. [----------] Global test environment set-up. [----------] 5 tests from LoadBalancerExplode/TestLoadBalancer [ RUN ] LoadBalancerExplode/TestLoadBalancer.test_load_balancer_keep_last_elm/0 Note: Google Test filter = */TestLoadBalancer.test_load_balancer_keep_last_elm/* [==========] Running 5 tests from 1 test suite. [----------] Global test environment set-up. [----------] 5 tests from LoadBalancerExplode/TestLoadBalancer [ RUN ] LoadBalancerExplode/TestLoadBalancer.test_load_balancer_keep_last_elm/0 Note: Google Test filter = */TestLoadBalancer.test_load_balancer_keep_last_elm/* [==========] Running 5 tests from 1 test suite. [----------] Global test environment set-up. [----------] 5 tests from LoadBalancerExplode/TestLoadBalancer [ RUN ] LoadBalancerExplode/TestLoadBalancer.test_load_balancer_keep_last_elm/0 vt: Runtime Initializing: interop=true: mode: single-thread per rank vt: Program: collection_extended (./collection_extended) vt: Running on: 4 nodes vt: Machine Hostname: rysy vt: MPI Version: 3.1 vt: MPI Max tag: 268435455 vt: Build SHA: 463bafe208e88c23b88c6de3516e4934356f04c0 vt: Build Ref: refs/heads/develop vt: Description: heads/develop-0-g463bafe208 vt: Compile-time Features Enabled: vt: Tracing Projections vt: Load Balancing for Collections vt: Message priorities vt: Memory Pooling vt: MPI access guards vt: Performance analysis diagnostics vt: Runtime Configuration: vt: Option: flag --vt_sched_num_progress on: Running MPI progress 2 times each invocation vt: Option: flag --vt_sched_progress_han on: Running MPI progress function at least every 0 handler(s) executed vt: Option: flag --vt_trace on: Tracing enabled vt: Default: Trace file "/home/cz4rs/nga/vt/build/tests/collection_extended_trace/collection_extended.0.log.gz", use --vt_trace_file to disable vt: Default: Trace directory "/home/cz4rs/nga/vt/build/tests/collection_extended_trace/", use --vt_trace_dir to disable vt: Default: Flushing traces at end of run, use --vt_trace_flush_size to disable vt: Option: flag --vt_print_no_progress on: Printing warnings when progress is stalls vt: Default: Printing verbose epoch graphs when hang detected, use --vt_epoch_graph_terse to disable vt: Option: flag --vt_epoch_graph_on_hang on: Epoch graph output enabled if hang detected vt: Default: Termination hang detection enabled by default, use --vt_no_detect_hang to disable vt: Option: flag --vt_hang_detect on: Printing stall warning every 1024 tree traversals vt: Default: SIGINT signal handling enabled by default, use --vt_no_SIGINT to disable vt: Default: SIGSEGV signal handling enabled by default, use --vt_no_SIGSEGV to disable vt: Default: std::terminate signal handling enabled by default, use --vt_no_terminate to disable vt: Default: Color output enabled, use --vt_no_color to disable vt: Default: Stack dumps enabled by default, use --vt_no_stack to disable vt: Option: flag --vt_memory_reporters on: Memory usage checker precedence: mstats,machinfo,selfstat,selfstatm,sbrk,mallinfo,getrusage,ps vt: Working memory reporters: selfstatm,mallinfo,getrusage vt: Initial memory usage: selfstatm=25.9453 mallinfo=0.991104 getrusage=25.9453 (MiB) vt: Default: Memory usage printing disabled, use --vt_print_memory_each_phase to enable vt: Default: Diagnostics are disabled by default, use --vt_diag_enable to enable vt: Option: flag --vt_max_mpi_send_size on: Splitting messages greater than 1 GiB vt: Option: flag --vt_debug_level on: Debug print mode set to terse vt: Pass-through Arguments: vt: None. All arguments handled. Testing lb RandomLB vt: [0] (t) lb: LBManager::finishedLB, phase=0 vt: [0] (t) lb: BaseLB: Statistic=P_l: max=0.45, min=0.09, sum=1.07, avg=0.27, var=0.02, stdev=0.14, nproc=4, cardinality=4 skewness=0.00, kurtosis=-2.07, npr=4, imb=0.68, num_stats=1 vt: [0] (t) lb: BaseLB: Statistic=O_l: max=0.00, min=0.00, sum=0.00, avg=0.00, var=0.00, stdev=0.00, nproc=64, cardinality=64 skewness=-0.01, kurtosis=-1.13, npr=64, imb=1.06, num_stats=2 vt: [0] (t) lb: RandomLB: runLB: randomize_seed=false, seed=123456789 vt: [0] (t) lb: BaseLB: Statistic=O_l: max=0.00, min=0.00, sum=0.00, avg=0.00, var=0.00, stdev=0.00, nproc=64, cardinality=64 skewness=-0.01, kurtosis=-1.13, npr=64, imb=1.06, num_stats=2 vt: [0] (t) lb: BaseLB: Statistic=P_l: max=0.45, min=0.09, sum=1.07, avg=0.27, var=0.02, stdev=0.14, nproc=4, cardinality=4 skewness=0.00, kurtosis=-2.07, npr=4, imb=0.68, num_stats=2 vt: [0] (t) lb: BaseLB::finalize: LB total time=0.0006918907165527344, total migration count=45 vt: [0] (t) lb: LBManager::finishedLB, phase=1 vt: [0] (t) lb: BaseLB: Statistic=P_l: max=0.30, min=0.24, sum=1.05, avg=0.26, var=0.00, stdev=0.02, nproc=4, cardinality=4 skewness=0.27, kurtosis=-2.03, npr=4, imb=0.13, num_stats=1 vt: [0] (t) lb: BaseLB: Statistic=O_l: max=0.00, min=0.00, sum=0.00, avg=0.00, var=0.00, stdev=0.00, nproc=64, cardinality=64 skewness=-0.01, kurtosis=-1.18, npr=64, imb=1.01, num_stats=2 vt: [0] (t) lb: RandomLB: runLB: randomize_seed=false, seed=123456789 vt: [0] (t) lb: BaseLB: Statistic=O_l: max=0.00, min=0.00, sum=0.00, avg=0.00, var=0.00, stdev=0.00, nproc=64, cardinality=64 skewness=-0.01, kurtosis=-1.18, npr=64, imb=1.01, num_stats=2 vt: [0] (t) lb: BaseLB::finalize: LB total time=0.0004367828369140625, total migration count=47 vt: [0] (t) lb: BaseLB: Statistic=P_l: max=0.30, min=0.24, sum=1.05, avg=0.26, var=0.00, stdev=0.02, nproc=4, cardinality=4 skewness=0.27, kurtosis=-2.03, npr=4, imb=0.13, num_stats=2 vt: [0] (t) lb: LBManager::finishedLB, phase=2 vt: [0] (t) lb: BaseLB: Statistic=P_l: max=0.33, min=0.23, sum=1.05, avg=0.26, var=0.00, stdev=0.04, nproc=4, cardinality=4 skewness=0.52, kurtosis=-1.88, npr=4, imb=0.26, num_stats=1 vt: [0] (t) lb: BaseLB: Statistic=O_l: max=0.00, min=0.00, sum=0.00, avg=0.00, var=0.00, stdev=0.00, nproc=64, cardinality=64 skewness=-0.05, kurtosis=-1.28, npr=64, imb=0.86, num_stats=2 vt: [0] (t) lb: RandomLB: runLB: randomize_seed=false, seed=123456789 vt: [0] (t) lb: BaseLB: Statistic=O_l: max=0.00, min=0.00, sum=0.00, avg=0.00, var=0.00, stdev=0.00, nproc=64, cardinality=64 skewness=-0.05, kurtosis=-1.28, npr=64, imb=0.86, num_stats=2 vt: [0] (t) lb: BaseLB::finalize: LB total time=0.0004305839538574219, total migration count=43 vt: [0] (t) lb: BaseLB: Statistic=P_l: max=0.33, min=0.23, sum=1.05, avg=0.26, var=0.00, stdev=0.04, nproc=4, cardinality=4 skewness=0.52, kurtosis=-1.88, npr=4, imb=0.26, num_stats=2 vt: [0] (t) lb: LBManager::finishedLB, phase=3 vt: [0] (t) lb: BaseLB: Statistic=P_l: max=0.31, min=0.22, sum=1.06, avg=0.27, var=0.00, stdev=0.03, nproc=4, cardinality=4 skewness=-0.14, kurtosis=-1.88, npr=4, imb=0.15, num_stats=1 vt: [0] (t) lb: BaseLB: Statistic=O_l: max=0.00, min=0.00, sum=0.00, avg=0.00, var=0.00, stdev=0.00, nproc=64, cardinality=64 skewness=-0.01, kurtosis=-1.31, npr=64, imb=0.98, num_stats=2 vt: [0] (t) lb: RandomLB: runLB: randomize_seed=false, seed=123456789 vt: [0] (t) lb: BaseLB: Statistic=O_l: max=0.00, min=0.00, sum=0.00, avg=0.00, var=0.00, stdev=0.00, nproc=64, cardinality=64 skewness=-0.01, kurtosis=-1.31, npr=64, imb=0.98, num_stats=2 vt: [0] (t) lb: BaseLB::finalize: LB total time=0.0004246234893798828, total migration count=48 vt: [0] (t) lb: BaseLB: Statistic=P_l: max=0.31, min=0.22, sum=1.06, avg=0.27, var=0.00, stdev=0.03, nproc=4, cardinality=4 skewness=-0.14, kurtosis=-1.88, npr=4, imb=0.15, num_stats=2 vt: [0] (t) lb: LBManager::finishedLB, phase=4 vt: [0] (t) lb: BaseLB: Statistic=P_l: max=0.30, min=0.22, sum=1.06, avg=0.26, var=0.00, stdev=0.03, nproc=4, cardinality=4 skewness=-0.13, kurtosis=-2.21, npr=4, imb=0.15, num_stats=1 vt: [0] (t) lb: BaseLB: Statistic=O_l: max=0.00, min=0.00, sum=0.00, avg=0.00, var=0.00, stdev=0.00, nproc=64, cardinality=64 skewness=0.00, kurtosis=-1.25, npr=64, imb=0.88, num_stats=2 vt: [0] (t) lb: RandomLB: runLB: randomize_seed=false, seed=123456789 vt: [0] (t) lb: BaseLB: Statistic=O_l: max=0.00, min=0.00, sum=0.00, avg=0.00, var=0.00, stdev=0.00, nproc=64, cardinality=64 skewness=0.00, kurtosis=-1.25, npr=64, imb=0.88, num_stats=2 vt: [0] (t) lb: BaseLB: Statistic=P_l: max=0.30, min=0.22, sum=1.06, avg=0.26, var=0.00, stdev=0.03, nproc=4, cardinality=4 skewness=-0.13, kurtosis=-2.21, npr=4, imb=0.15, num_stats=2 vt: [0] (t) lb: BaseLB::finalize: LB total time=0.00038814544677734375, total migration count=39 vt: [0] (t) lb: LBManager::finishedLB, phase=5 vt: [0] (t) lb: BaseLB: Statistic=P_l: max=0.38, min=0.13, sum=1.05, avg=0.26, var=0.01, stdev=0.09, nproc=4, cardinality=4 skewness=-0.20, kurtosis=-1.96, npr=4, imb=0.44, num_stats=1 vt: [0] (t) lb: BaseLB: Statistic=O_l: max=0.00, min=0.00, sum=0.00, avg=0.00, var=0.00, stdev=0.00, nproc=64, cardinality=64 skewness=0.02, kurtosis=-1.23, npr=64, imb=0.93, num_stats=2 vt: [0] (t) lb: RandomLB: runLB: randomize_seed=false, seed=123456789 vt: [0] (t) lb: BaseLB: Statistic=O_l: max=0.00, min=0.00, sum=0.00, avg=0.00, var=0.00, stdev=0.00, nproc=64, cardinality=64 skewness=0.02, kurtosis=-1.23, npr=64, imb=0.93, num_stats=2 vt: [0] (t) lb: BaseLB: Statistic=P_l: max=0.38, min=0.13, sum=1.05, avg=0.26, var=0.01, stdev=0.09, nproc=4, cardinality=4 skewness=-0.20, kurtosis=-1.96, npr=4, imb=0.44, num_stats=2 vt: [0] (t) lb: BaseLB::finalize: LB total time=0.0004589557647705078, total migration count=46 vt: [0] (t) lb: LBManager::finishedLB, phase=6 vt: [0] (t) lb: BaseLB: Statistic=P_l: max=0.32, min=0.15, sum=1.06, avg=0.27, var=0.00, stdev=0.07, nproc=4, cardinality=4 skewness=-0.56, kurtosis=-1.84, npr=4, imb=0.22, num_stats=1 vt: [0] (t) lb: BaseLB: Statistic=O_l: max=0.00, min=0.00, sum=0.00, avg=0.00, var=0.00, stdev=0.00, nproc=64, cardinality=64 skewness=-0.04, kurtosis=-1.24, npr=64, imb=0.93, num_stats=2 vt: [0] (t) lb: RandomLB: runLB: randomize_seed=false, seed=123456789 vt: [0] (t) lb: BaseLB: Statistic=O_l: max=0.00, min=0.00, sum=0.00, avg=0.00, var=0.00, stdev=0.00, nproc=64, cardinality=64 skewness=-0.04, kurtosis=-1.24, npr=64, imb=0.93, num_stats=2 vt: [0] (t) lb: BaseLB: Statistic=P_l: max=0.32, min=0.15, sum=1.06, avg=0.27, var=0.00, stdev=0.07, nproc=4, cardinality=4 skewness=-0.56, kurtosis=-1.84, npr=4, imb=0.22, num_stats=2 vt: [0] (t) lb: BaseLB::finalize: LB total time=0.0004355907440185547, total migration count=47 vt: [0] (t) lb: LBManager::finishedLB, phase=7 vt: [0] (t) lb: BaseLB: Statistic=P_l: max=0.33, min=0.23, sum=1.04, avg=0.26, var=0.00, stdev=0.04, nproc=4, cardinality=4 skewness=0.54, kurtosis=-1.86, npr=4, imb=0.26, num_stats=1 vt: [0] (t) lb: BaseLB: Statistic=O_l: max=0.00, min=0.00, sum=0.00, avg=0.00, var=0.00, stdev=0.00, nproc=64, cardinality=64 skewness=-0.01, kurtosis=-1.23, npr=64, imb=0.90, num_stats=2 vt: [0] (t) lb: RandomLB: runLB: randomize_seed=false, seed=123456789 vt: [0] (t) lb: BaseLB::finalize: LB total time=0.0003669261932373047, total migration count=43 vt: [0] (t) lb: BaseLB: Statistic=O_l: max=0.00, min=0.00, sum=0.00, avg=0.00, var=0.00, stdev=0.00, nproc=64, cardinality=64 skewness=-0.01, kurtosis=-1.23, npr=64, imb=0.90, num_stats=2 vt: [0] (t) lb: BaseLB: Statistic=P_l: max=0.33, min=0.23, sum=1.04, avg=0.26, var=0.00, stdev=0.04, nproc=4, cardinality=4 skewness=0.54, kurtosis=-1.86, npr=4, imb=0.26, num_stats=2 vt: [0] (t) lb: LBManager::finishedLB, phase=8 vt: [0] (t) lb: BaseLB: Statistic=P_l: max=0.29, min=0.23, sum=1.05, avg=0.26, var=0.00, stdev=0.03, nproc=4, cardinality=4 skewness=-0.11, kurtosis=-2.29, npr=4, imb=0.10, num_stats=1 vt: [0] (t) lb: BaseLB: Statistic=O_l: max=0.00, min=0.00, sum=0.00, avg=0.00, var=0.00, stdev=0.00, nproc=64, cardinality=64 skewness=-0.01, kurtosis=-1.28, npr=64, imb=0.92, num_stats=2 vt: [0] (t) lb: RandomLB: runLB: randomize_seed=false, seed=123456789 vt: [0] (t) lb: BaseLB::finalize: LB total time=0.0003845691680908203, total migration count=46 vt: [0] (t) lb: BaseLB: Statistic=O_l: max=0.00, min=0.00, sum=0.00, avg=0.00, var=0.00, stdev=0.00, nproc=64, cardinality=64 skewness=-0.01, kurtosis=-1.28, npr=64, imb=0.92, num_stats=2 vt: [0] (t) lb: BaseLB: Statistic=P_l: max=0.29, min=0.23, sum=1.05, avg=0.26, var=0.00, stdev=0.03, nproc=4, cardinality=4 skewness=-0.11, kurtosis=-2.29, npr=4, imb=0.10, num_stats=2 vt: [0] (t) lb: LBManager::finishedLB, phase=9 vt: [-1] (t) trace: writeTracesFile: to_write=3819, already_written=0 vt: Total collective epochs processed: 96 vt: Total work units processed: 6930 vt: Runtime Finalizing [ OK ] LoadBalancerExplode/TestLoadBalancer.test_load_balancer_keep_last_elm/0 (33 ms) [ RUN ] LoadBalancerExplode/TestLoadBalancer.test_load_balancer_keep_last_elm/1 [ OK ] LoadBalancerExplode/TestLoadBalancer.test_load_balancer_keep_last_elm/0 (33 ms) [ RUN ] LoadBalancerExplode/TestLoadBalancer.test_load_balancer_keep_last_elm/1 [ OK ] LoadBalancerExplode/TestLoadBalancer.test_load_balancer_keep_last_elm/0 (33 ms) [ RUN ] LoadBalancerExplode/TestLoadBalancer.test_load_balancer_keep_last_elm/1 [ OK ] LoadBalancerExplode/TestLoadBalancer.test_load_balancer_keep_last_elm/0 (33 ms) [ RUN ] LoadBalancerExplode/TestLoadBalancer.test_load_balancer_keep_last_elm/1 vt: Runtime Initializing: interop=true: mode: single-thread per rank vt: Program: collection_extended (./collection_extended) vt: Running on: 4 nodes vt: Machine Hostname: rysy vt: MPI Version: 3.1 vt: MPI Max tag: 268435455 vt: Build SHA: 463bafe208e88c23b88c6de3516e4934356f04c0 vt: Build Ref: refs/heads/develop vt: Description: heads/develop-0-g463bafe208 vt: Compile-time Features Enabled: vt: Tracing Projections vt: Load Balancing for Collections vt: Message priorities vt: Memory Pooling vt: MPI access guards vt: Performance analysis diagnostics vt: Runtime Configuration: vt: Option: flag --vt_sched_num_progress on: Running MPI progress 2 times each invocation vt: Option: flag --vt_sched_progress_han on: Running MPI progress function at least every 0 handler(s) executed vt: Option: flag --vt_trace on: Tracing enabled vt: Default: Trace file "/home/cz4rs/nga/vt/build/tests/collection_extended_trace/collection_extended.0.log.gz", use --vt_trace_file to disable vt: Default: Trace directory "/home/cz4rs/nga/vt/build/tests/collection_extended_trace/", use --vt_trace_dir to disable vt: Default: Flushing traces at end of run, use --vt_trace_flush_size to disable vt: Option: flag --vt_print_no_progress on: Printing warnings when progress is stalls vt: Default: Printing verbose epoch graphs when hang detected, use --vt_epoch_graph_terse to disable vt: Option: flag --vt_epoch_graph_on_hang on: Epoch graph output enabled if hang detected vt: Default: Termination hang detection enabled by default, use --vt_no_detect_hang to disable vt: Option: flag --vt_hang_detect on: Printing stall warning every 1024 tree traversals vt: Default: SIGINT signal handling enabled by default, use --vt_no_SIGINT to disable vt: Default: SIGSEGV signal handling enabled by default, use --vt_no_SIGSEGV to disable vt: Default: std::terminate signal handling enabled by default, use --vt_no_terminate to disable vt: Default: Color output enabled, use --vt_no_color to disable vt: Default: Stack dumps enabled by default, use --vt_no_stack to disable vt: Option: flag --vt_memory_reporters on: Memory usage checker precedence: mstats,machinfo,selfstat,selfstatm,sbrk,mallinfo,getrusage,ps vt: Working memory reporters: selfstatm,mallinfo,getrusage vt: Initial memory usage: selfstatm=29.8867 mallinfo=1.06377 getrusage=29.9922 (MiB) vt: Default: Memory usage printing disabled, use --vt_print_memory_each_phase to enable vt: Default: Diagnostics are disabled by default, use --vt_diag_enable to enable vt: Option: flag --vt_max_mpi_send_size on: Splitting messages greater than 1 GiB vt: Option: flag --vt_debug_level on: Debug print mode set to terse vt: Pass-through Arguments: vt: None. All arguments handled. Testing lb RotateLB vt: [0] (t) lb: LBManager::finishedLB, phase=0 vt: [0] (t) lb: BaseLB: Statistic=P_l: max=0.45, min=0.08, sum=1.07, avg=0.27, var=0.02, stdev=0.14, nproc=4, cardinality=4 skewness=-0.00, kurtosis=-2.07, npr=4, imb=0.68, num_stats=1 vt: [0] (t) lb: BaseLB: Statistic=O_l: max=0.00, min=0.00, sum=0.00, avg=0.00, var=0.00, stdev=0.00, nproc=64, cardinality=64 skewness=0.11, kurtosis=-0.95, npr=64, imb=1.38, num_stats=2 vt: [0] (t) lb: RotateLB: runLB: next_node=1 vt: [0] (t) lb: BaseLB: Statistic=P_l: max=0.45, min=0.08, sum=1.07, avg=0.27, var=0.02, stdev=0.14, nproc=4, cardinality=4 skewness=-0.00, kurtosis=-2.07, npr=4, imb=0.68, num_stats=2 vt: [0] (t) lb: BaseLB: Statistic=O_l: max=0.00, min=0.00, sum=0.00, avg=0.00, var=0.00, stdev=0.00, nproc=64, cardinality=64 skewness=0.11, kurtosis=-0.95, npr=64, imb=1.38, num_stats=2 vt: [0] (t) lb: BaseLB::finalize: LB total time=0.0005774497985839844, total migration count=64 vt: [0] (t) lb: LBManager::finishedLB, phase=1 vt: [0] (t) lb: BaseLB: Statistic=P_l: max=0.42, min=0.09, sum=1.05, avg=0.26, var=0.02, stdev=0.12, nproc=4, cardinality=4 skewness=-0.09, kurtosis=-2.10, npr=4, imb=0.61, num_stats=1 vt: [0] (t) lb: BaseLB: Statistic=O_l: max=0.00, min=0.00, sum=0.00, avg=0.00, var=0.00, stdev=0.00, nproc=64, cardinality=64 skewness=0.01, kurtosis=-1.22, npr=64, imb=1.01, num_stats=2 vt: [0] (t) lb: RotateLB: runLB: next_node=1 vt: [0] (t) lb: BaseLB: Statistic=O_l: max=0.00, min=0.00, sum=0.00, avg=0.00, var=0.00, stdev=0.00, nproc=64, cardinality=64 skewness=0.01, kurtosis=-1.22, npr=64, imb=1.01, num_stats=2 vt: [0] (t) lb: BaseLB::finalize: LB total time=0.00047850608825683594, total migration count=64 vt: [0] (t) lb: BaseLB: Statistic=P_l: max=0.42, min=0.09, sum=1.05, avg=0.26, var=0.02, stdev=0.12, nproc=4, cardinality=4 skewness=-0.09, kurtosis=-2.10, npr=4, imb=0.61, num_stats=2 vt: [0] (t) lb: LBManager::finishedLB, phase=2 vt: [0] (t) lb: BaseLB: Statistic=P_l: max=0.42, min=0.09, sum=1.04, avg=0.26, var=0.01, stdev=0.12, nproc=4, cardinality=4 skewness=-0.05, kurtosis=-2.04, npr=4, imb=0.62, num_stats=1 vt: [0] (t) lb: BaseLB: Statistic=O_l: max=0.00, min=0.00, sum=0.00, avg=0.00, var=0.00, stdev=0.00, nproc=64, cardinality=64 skewness=-0.00, kurtosis=-1.24, npr=64, imb=0.93, num_stats=2 vt: [0] (t) lb: RotateLB: runLB: next_node=1 vt: [0] (t) lb: BaseLB: Statistic=O_l: max=0.00, min=0.00, sum=0.00, avg=0.00, var=0.00, stdev=0.00, nproc=64, cardinality=64 skewness=-0.00, kurtosis=-1.24, npr=64, imb=0.93, num_stats=2 vt: [0] (t) lb: BaseLB::finalize: LB total time=0.00039076805114746094, total migration count=64 vt: [0] (t) lb: BaseLB: Statistic=P_l: max=0.42, min=0.09, sum=1.04, avg=0.26, var=0.01, stdev=0.12, nproc=4, cardinality=4 skewness=-0.05, kurtosis=-2.04, npr=4, imb=0.62, num_stats=2 vt: [0] (t) lb: LBManager::finishedLB, phase=3 vt: [0] (t) lb: BaseLB: Statistic=P_l: max=0.40, min=0.10, sum=1.06, avg=0.26, var=0.01, stdev=0.11, nproc=4, cardinality=4 skewness=-0.15, kurtosis=-2.07, npr=4, imb=0.52, num_stats=1 vt: [0] (t) lb: BaseLB: Statistic=O_l: max=0.00, min=0.00, sum=0.00, avg=0.00, var=0.00, stdev=0.00, nproc=64, cardinality=64 skewness=-0.04, kurtosis=-1.25, npr=64, imb=0.90, num_stats=2 vt: [0] (t) lb: RotateLB: runLB: next_node=1 vt: [0] (t) lb: BaseLB: Statistic=O_l: max=0.00, min=0.00, sum=0.00, avg=0.00, var=0.00, stdev=0.00, nproc=64, cardinality=64 skewness=-0.04, kurtosis=-1.25, npr=64, imb=0.90, num_stats=2 vt: [0] (t) lb: BaseLB::finalize: LB total time=0.00039005279541015625, total migration count=64 vt: [0] (t) lb: BaseLB: Statistic=P_l: max=0.40, min=0.10, sum=1.06, avg=0.26, var=0.01, stdev=0.11, nproc=4, cardinality=4 skewness=-0.15, kurtosis=-2.07, npr=4, imb=0.52, num_stats=2 vt: [0] (t) lb: LBManager::finishedLB, phase=4 vt: [0] (t) lb: BaseLB: Statistic=P_l: max=0.38, min=0.11, sum=1.04, avg=0.26, var=0.01, stdev=0.10, nproc=4, cardinality=4 skewness=-0.25, kurtosis=-2.08, npr=4, imb=0.44, num_stats=1 vt: [0] (t) lb: BaseLB: Statistic=O_l: max=0.00, min=0.00, sum=0.00, avg=0.00, var=0.00, stdev=0.00, nproc=64, cardinality=64 skewness=-0.03, kurtosis=-1.26, npr=64, imb=0.95, num_stats=2 vt: [0] (t) lb: RotateLB: runLB: next_node=1 vt: [0] (t) lb: BaseLB: Statistic=O_l: max=0.00, min=0.00, sum=0.00, avg=0.00, var=0.00, stdev=0.00, nproc=64, cardinality=64 skewness=-0.03, kurtosis=-1.26, npr=64, imb=0.95, num_stats=2 vt: [0] (t) lb: BaseLB: Statistic=P_l: max=0.38, min=0.11, sum=1.04, avg=0.26, var=0.01, stdev=0.10, nproc=4, cardinality=4 skewness=-0.25, kurtosis=-2.08, npr=4, imb=0.44, num_stats=2 vt: [0] (t) lb: BaseLB::finalize: LB total time=0.00046062469482421875, total migration count=64 vt: [0] (t) lb: LBManager::finishedLB, phase=5 vt: [0] (t) lb: BaseLB: Statistic=P_l: max=0.35, min=0.11, sum=1.04, avg=0.26, var=0.01, stdev=0.09, nproc=4, cardinality=4 skewness=-0.35, kurtosis=-2.03, npr=4, imb=0.37, num_stats=1 vt: [0] (t) lb: BaseLB: Statistic=O_l: max=0.00, min=0.00, sum=0.00, avg=0.00, var=0.00, stdev=0.00, nproc=64, cardinality=64 skewness=-0.04, kurtosis=-1.26, npr=64, imb=0.97, num_stats=2 vt: [0] (t) lb: RotateLB: runLB: next_node=1 vt: [0] (t) lb: BaseLB: Statistic=O_l: max=0.00, min=0.00, sum=0.00, avg=0.00, var=0.00, stdev=0.00, nproc=64, cardinality=64 skewness=-0.04, kurtosis=-1.26, npr=64, imb=0.97, num_stats=2 vt: [0] (t) lb: BaseLB::finalize: LB total time=0.0004584789276123047, total migration count=64 vt: [0] (t) lb: BaseLB: Statistic=P_l: max=0.35, min=0.11, sum=1.04, avg=0.26, var=0.01, stdev=0.09, nproc=4, cardinality=4 skewness=-0.35, kurtosis=-2.03, npr=4, imb=0.37, num_stats=2 vt: [0] (t) lb: LBManager::finishedLB, phase=6 vt: [0] (t) lb: BaseLB: Statistic=P_l: max=0.36, min=0.12, sum=1.06, avg=0.27, var=0.01, stdev=0.09, nproc=4, cardinality=4 skewness=-0.38, kurtosis=-1.95, npr=4, imb=0.37, num_stats=1 vt: [0] (t) lb: BaseLB: Statistic=O_l: max=0.00, min=0.00, sum=0.00, avg=0.00, var=0.00, stdev=0.00, nproc=64, cardinality=64 skewness=-0.00, kurtosis=-1.23, npr=64, imb=1.01, num_stats=2 vt: [0] (t) lb: RotateLB: runLB: next_node=1 vt: [0] (t) lb: BaseLB: Statistic=O_l: max=0.00, min=0.00, sum=0.00, avg=0.00, var=0.00, stdev=0.00, nproc=64, cardinality=64 skewness=-0.00, kurtosis=-1.23, npr=64, imb=1.01, num_stats=2 vt: [0] (t) lb: BaseLB::finalize: LB total time=0.00037741661071777344, total migration count=64 vt: [0] (t) lb: BaseLB: Statistic=P_l: max=0.36, min=0.12, sum=1.06, avg=0.27, var=0.01, stdev=0.09, nproc=4, cardinality=4 skewness=-0.38, kurtosis=-1.95, npr=4, imb=0.37, num_stats=2 vt: [0] (t) lb: LBManager::finishedLB, phase=7 vt: [0] (t) lb: BaseLB: Statistic=P_l: max=0.34, min=0.14, sum=1.07, avg=0.27, var=0.01, stdev=0.08, nproc=4, cardinality=4 skewness=-0.57, kurtosis=-1.82, npr=4, imb=0.28, num_stats=1 vt: [0] (t) lb: BaseLB: Statistic=O_l: max=0.00, min=0.00, sum=0.00, avg=0.00, var=0.00, stdev=0.00, nproc=64, cardinality=64 skewness=-0.04, kurtosis=-1.26, npr=64, imb=0.89, num_stats=2 vt: [0] (t) lb: RotateLB: runLB: next_node=1 vt: [0] (t) lb: BaseLB: Statistic=O_l: max=0.00, min=0.00, sum=0.00, avg=0.00, var=0.00, stdev=0.00, nproc=64, cardinality=64 skewness=-0.04, kurtosis=-1.26, npr=64, imb=0.89, num_stats=2 vt: [0] (t) lb: BaseLB::finalize: LB total time=0.00043892860412597656, total migration count=64 vt: [0] (t) lb: BaseLB: Statistic=P_l: max=0.34, min=0.14, sum=1.07, avg=0.27, var=0.01, stdev=0.08, nproc=4, cardinality=4 skewness=-0.57, kurtosis=-1.82, npr=4, imb=0.28, num_stats=2 vt: [0] (t) lb: LBManager::finishedLB, phase=8 vt: [0] (t) lb: BaseLB: Statistic=P_l: max=0.33, min=0.14, sum=1.08, avg=0.27, var=0.01, stdev=0.08, nproc=4, cardinality=4 skewness=-0.65, kurtosis=-1.76, npr=4, imb=0.24, num_stats=1 vt: [0] (t) lb: BaseLB: Statistic=O_l: max=0.00, min=0.00, sum=0.00, avg=0.00, var=0.00, stdev=0.00, nproc=64, cardinality=64 skewness=-0.03, kurtosis=-1.31, npr=64, imb=0.90, num_stats=2 vt: [0] (t) lb: RotateLB: runLB: next_node=1 vt: [0] (t) lb: BaseLB::finalize: LB total time=0.0004057884216308594, total migration count=64 vt: [0] (t) lb: BaseLB: Statistic=P_l: max=0.33, min=0.14, sum=1.08, avg=0.27, var=0.01, stdev=0.08, nproc=4, cardinality=4 skewness=-0.65, kurtosis=-1.76, npr=4, imb=0.24, num_stats=2 vt: [0] (t) lb: BaseLB: Statistic=O_l: max=0.00, min=0.00, sum=0.00, avg=0.00, var=0.00, stdev=0.00, nproc=64, cardinality=64 skewness=-0.03, kurtosis=-1.31, npr=64, imb=0.90, num_stats=2 vt: [0] (t) lb: LBManager::finishedLB, phase=9 vt: [-1] (t) trace: writeTracesFile: to_write=3807, already_written=0 vt: Total collective epochs processed: 96 vt: Total work units processed: 7888 vt: Runtime Finalizing [ OK ] LoadBalancerExplode/TestLoadBalancer.test_load_balancer_keep_last_elm/1 (21 ms) [ RUN ] LoadBalancerExplode/TestLoadBalancer.test_load_balancer_keep_last_elm/2 [ OK ] LoadBalancerExplode/TestLoadBalancer.test_load_balancer_keep_last_elm/1 (21 ms) [ RUN ] LoadBalancerExplode/TestLoadBalancer.test_load_balancer_keep_last_elm/2 [ OK ] LoadBalancerExplode/TestLoadBalancer.test_load_balancer_keep_last_elm/1 (21 ms) [ RUN ] LoadBalancerExplode/TestLoadBalancer.test_load_balancer_keep_last_elm/2 [ OK ] LoadBalancerExplode/TestLoadBalancer.test_load_balancer_keep_last_elm/1 (21 ms) [ RUN ] LoadBalancerExplode/TestLoadBalancer.test_load_balancer_keep_last_elm/2 vt: Runtime Initializing: interop=true: mode: single-thread per rank vt: Program: collection_extended (./collection_extended) vt: Running on: 4 nodes vt: Machine Hostname: rysy vt: MPI Version: 3.1 vt: MPI Max tag: 268435455 vt: Build SHA: 463bafe208e88c23b88c6de3516e4934356f04c0 vt: Build Ref: refs/heads/develop vt: Description: heads/develop-0-g463bafe208 vt: Compile-time Features Enabled: vt: Tracing Projections vt: Load Balancing for Collections vt: Message priorities vt: Memory Pooling vt: MPI access guards vt: Performance analysis diagnostics vt: Runtime Configuration: vt: Option: flag --vt_sched_num_progress on: Running MPI progress 2 times each invocation vt: Option: flag --vt_sched_progress_han on: Running MPI progress function at least every 0 handler(s) executed vt: Option: flag --vt_trace on: Tracing enabled vt: Default: Trace file "/home/cz4rs/nga/vt/build/tests/collection_extended_trace/collection_extended.0.log.gz", use --vt_trace_file to disable vt: Default: Trace directory "/home/cz4rs/nga/vt/build/tests/collection_extended_trace/", use --vt_trace_dir to disable vt: Default: Flushing traces at end of run, use --vt_trace_flush_size to disable vt: Option: flag --vt_print_no_progress on: Printing warnings when progress is stalls vt: Default: Printing verbose epoch graphs when hang detected, use --vt_epoch_graph_terse to disable vt: Option: flag --vt_epoch_graph_on_hang on: Epoch graph output enabled if hang detected vt: Default: Termination hang detection enabled by default, use --vt_no_detect_hang to disable vt: Option: flag --vt_hang_detect on: Printing stall warning every 1024 tree traversals vt: Default: SIGINT signal handling enabled by default, use --vt_no_SIGINT to disable vt: Default: SIGSEGV signal handling enabled by default, use --vt_no_SIGSEGV to disable vt: Default: std::terminate signal handling enabled by default, use --vt_no_terminate to disable vt: Default: Color output enabled, use --vt_no_color to disable vt: Default: Stack dumps enabled by default, use --vt_no_stack to disable vt: Option: flag --vt_memory_reporters on: Memory usage checker precedence: mstats,machinfo,selfstat,selfstatm,sbrk,mallinfo,getrusage,ps vt: Working memory reporters: selfstatm,mallinfo,getrusage vt: Initial memory usage: selfstatm=30.1328 mallinfo=1.06505 getrusage=30.2422 (MiB) vt: Default: Memory usage printing disabled, use --vt_print_memory_each_phase to enable vt: Default: Diagnostics are disabled by default, use --vt_diag_enable to enable vt: Option: flag --vt_max_mpi_send_size on: Splitting messages greater than 1 GiB vt: Option: flag --vt_debug_level on: Debug print mode set to terse vt: Pass-through Arguments: vt: None. All arguments handled. Testing lb HierarchicalLB vt: [0] (t) lb: LBManager::finishedLB, phase=0 vt: [0] (t) lb: BaseLB: Statistic=P_l: max=0.46, min=0.08, sum=1.08, avg=0.27, var=0.02, stdev=0.14, nproc=4, cardinality=4 skewness=-0.01, kurtosis=-2.07, npr=4, imb=0.70, num_stats=1 vt: [0] (t) lb: BaseLB: Statistic=O_l: max=0.00, min=0.00, sum=0.00, avg=0.00, var=0.00, stdev=0.00, nproc=64, cardinality=64 skewness=0.01, kurtosis=-1.20, npr=64, imb=1.07, num_stats=2 vt: [0] (t) HierarchicalLB: loadStats: load=0.08, total=1.08, avg=0.27, I=0.70,should_lb=true, auto=true, threshold=0.800000011920929 vt: [0] (t) lb: BaseLB: Statistic=P_l: max=40.21, min=30.21, sum=150.70, avg=37.67, var=18.57, stdev=4.31, nproc=4, cardinality=4 skewness=-0.75, kurtosis=-1.69, npr=4, imb=0.07, num_stats=2 vt: [0] (t) lb: BaseLB: Statistic=O_l: max=0.00, min=0.00, sum=0.00, avg=0.00, var=0.00, stdev=0.00, nproc=64, cardinality=64 skewness=0.01, kurtosis=-1.20, npr=64, imb=1.07, num_stats=2 vt: [0] (t) lb: BaseLB::finalize: LB total time=0.00038051605224609375, total migration count=12 vt: [0] (t) lb: LBManager::finishedLB, phase=1 vt: [0] (t) lb: BaseLB: Statistic=P_l: max=0.31, min=0.17, sum=1.06, avg=0.27, var=0.00, stdev=0.05, nproc=4, cardinality=4 skewness=-0.71, kurtosis=-1.71, npr=4, imb=0.16, num_stats=1 vt: [0] (t) lb: BaseLB: Statistic=O_l: max=0.00, min=0.00, sum=0.00, avg=0.00, var=0.00, stdev=0.00, nproc=64, cardinality=64 skewness=-0.02, kurtosis=-1.24, npr=64, imb=0.95, num_stats=2 vt: [0] (t) HierarchicalLB: loadStats: load=0.17, total=1.06, avg=0.27, I=0.16,should_lb=true, auto=true, threshold=0.843168050258021 vt: [0] (t) lb: BaseLB: Statistic=P_l: max=30.22, min=30.17, sum=120.81, avg=30.20, var=0.00, stdev=0.02, nproc=4, cardinality=4 skewness=-0.41, kurtosis=-1.99, npr=4, imb=0.00, num_stats=2 vt: [0] (t) lb: BaseLB::finalize: LB total time=0.0003561973571777344, total migration count=8 vt: [0] (t) lb: BaseLB: Statistic=O_l: max=0.00, min=0.00, sum=0.00, avg=0.00, var=0.00, stdev=0.00, nproc=64, cardinality=64 skewness=-0.02, kurtosis=-1.24, npr=64, imb=0.95, num_stats=2 vt: [0] (t) lb: LBManager::finishedLB, phase=2 vt: [0] (t) lb: BaseLB: Statistic=P_l: max=0.31, min=0.23, sum=1.06, avg=0.27, var=0.00, stdev=0.03, nproc=4, cardinality=4 skewness=0.12, kurtosis=-2.21, npr=4, imb=0.15, num_stats=1 vt: [0] (t) lb: BaseLB: Statistic=O_l: max=0.00, min=0.00, sum=0.00, avg=0.00, var=0.00, stdev=0.00, nproc=64, cardinality=64 skewness=0.01, kurtosis=-1.25, npr=64, imb=0.94, num_stats=2 vt: [0] (t) HierarchicalLB: loadStats: load=0.23, total=1.06, avg=0.27, I=0.15,should_lb=true, auto=true, threshold=0.8504714863044454 vt: [0] (t) lb: BaseLB: Statistic=P_l: max=30.22, min=20.22, sum=110.86, avg=27.72, var=18.71, stdev=4.33, nproc=4, cardinality=4 skewness=-0.75, kurtosis=-1.69, npr=4, imb=0.09, num_stats=2 vt: [0] (t) lb: BaseLB: Statistic=O_l: max=0.00, min=0.00, sum=0.00, avg=0.00, var=0.00, stdev=0.00, nproc=64, cardinality=64 skewness=0.01, kurtosis=-1.25, npr=64, imb=0.94, num_stats=2 vt: [0] (t) lb: BaseLB::finalize: LB total time=0.00038814544677734375, total migration count=6 vt: [0] (t) lb: LBManager::finishedLB, phase=3 vt: [0] (t) lb: BaseLB: Statistic=P_l: max=0.30, min=0.25, sum=1.07, avg=0.27, var=0.00, stdev=0.02, nproc=4, cardinality=4 skewness=0.45, kurtosis=-1.82, npr=4, imb=0.11, num_stats=1 vt: [0] (t) lb: BaseLB: Statistic=O_l: max=0.00, min=0.00, sum=0.00, avg=0.00, var=0.00, stdev=0.00, nproc=64, cardinality=64 skewness=-0.00, kurtosis=-1.28, npr=64, imb=0.90, num_stats=2 vt: [0] (t) HierarchicalLB: loadStats: load=0.26, total=1.07, avg=0.27, I=0.11,should_lb=true, auto=true, threshold=0.8850111856823266 vt: [0] (t) lb: BaseLB: Statistic=O_l: max=0.00, min=0.00, sum=0.00, avg=0.00, var=0.00, stdev=0.00, nproc=64, cardinality=64 skewness=-0.00, kurtosis=-1.28, npr=64, imb=0.90, num_stats=2 vt: [0] (t) lb: BaseLB: Statistic=P_l: max=30.22, min=20.23, sum=90.92, avg=22.73, var=18.70, stdev=4.32, nproc=4, cardinality=4 skewness=0.75, kurtosis=-1.69, npr=4, imb=0.33, num_stats=2 vt: [0] (t) lb: BaseLB::finalize: LB total time=0.0003452301025390625, total migration count=6 vt: [0] (t) lb: LBManager::finishedLB, phase=4 vt: [0] (t) lb: BaseLB: Statistic=P_l: max=0.30, min=0.25, sum=1.07, avg=0.27, var=0.00, stdev=0.02, nproc=4, cardinality=4 skewness=0.58, kurtosis=-1.78, npr=4, imb=0.11, num_stats=1 vt: [0] (t) lb: BaseLB: Statistic=O_l: max=0.00, min=0.00, sum=0.00, avg=0.00, var=0.00, stdev=0.00, nproc=64, cardinality=64 skewness=0.00, kurtosis=-1.19, npr=64, imb=1.03, num_stats=2 vt: [0] (t) HierarchicalLB: loadStats: load=0.26, total=1.07, avg=0.27, I=0.11,should_lb=true, auto=true, threshold=0.891674127126231 vt: [0] (t) lb: BaseLB: Statistic=P_l: max=30.22, min=20.23, sum=100.89, avg=25.22, var=24.92, stdev=4.99, nproc=4, cardinality=4 skewness=0.00, kurtosis=-2.44, npr=4, imb=0.20, num_stats=2 vt: [0] (t) lb: BaseLB: Statistic=O_l: max=0.00, min=0.00, sum=0.00, avg=0.00, var=0.00, stdev=0.00, nproc=64, cardinality=64 skewness=0.00, kurtosis=-1.19, npr=64, imb=1.03, num_stats=2 vt: [0] (t) lb: BaseLB::finalize: LB total time=0.0003666877746582031, total migration count=7 vt: [0] (t) lb: LBManager::finishedLB, phase=5 vt: [0] (t) lb: BaseLB: Statistic=P_l: max=0.28, min=0.24, sum=1.06, avg=0.27, var=0.00, stdev=0.01, nproc=4, cardinality=4 skewness=-0.38, kurtosis=-2.03, npr=4, imb=0.05, num_stats=1 vt: [0] (t) lb: BaseLB: Statistic=O_l: max=0.00, min=0.00, sum=0.00, avg=0.00, var=0.00, stdev=0.00, nproc=64, cardinality=64 skewness=-0.02, kurtosis=-1.24, npr=64, imb=0.94, num_stats=2 vt: [0] (t) HierarchicalLB: loadStats: load=0.24, total=1.06, avg=0.27, I=0.05,should_lb=false, auto=true, threshold=0.9510552312528064 vt: [0] (t) lb: BaseLB: Statistic=P_l: max=0.28, min=0.24, sum=1.06, avg=0.27, var=0.00, stdev=0.01, nproc=4, cardinality=4 skewness=-0.38, kurtosis=-2.03, npr=4, imb=0.05, num_stats=2 vt: [0] (t) lb: BaseLB: Statistic=O_l: max=0.00, min=0.00, sum=0.00, avg=0.00, var=0.00, stdev=0.00, nproc=64, cardinality=64 skewness=-0.02, kurtosis=-1.24, npr=64, imb=0.94, num_stats=2 vt: [0] (t) lb: BaseLB::finalize: LB total time=0.0002532005310058594, total migration count=0 vt: [0] (t) lb: LBManager::finishedLB, phase=6 vt: [0] (t) lb: BaseLB: Statistic=P_l: max=0.29, min=0.25, sum=1.09, avg=0.27, var=0.00, stdev=0.02, nproc=4, cardinality=4 skewness=-0.32, kurtosis=-2.03, npr=4, imb=0.06, num_stats=1 vt: [0] (t) lb: BaseLB: Statistic=O_l: max=0.00, min=0.00, sum=0.00, avg=0.00, var=0.00, stdev=0.00, nproc=64, cardinality=64 skewness=0.19, kurtosis=-0.83, npr=64, imb=1.47, num_stats=2 vt: [0] (t) HierarchicalLB: loadStats: load=0.25, total=1.09, avg=0.27, I=0.06,should_lb=true, auto=true, threshold=0.9371303395399782 vt: [0] (t) lb: BaseLB: Statistic=P_l: max=20.24, min=10.25, sum=70.96, avg=17.74, var=18.71, stdev=4.32, nproc=4, cardinality=4 skewness=-0.75, kurtosis=-1.69, npr=4, imb=0.14, num_stats=2 vt: [0] (t) lb: BaseLB: Statistic=O_l: max=0.00, min=0.00, sum=0.00, avg=0.00, var=0.00, stdev=0.00, nproc=64, cardinality=64 skewness=0.19, kurtosis=-0.83, npr=64, imb=1.47, num_stats=2 vt: [0] (t) lb: BaseLB::finalize: LB total time=0.00031065940856933594, total migration count=5 vt: [0] (t) lb: LBManager::finishedLB, phase=7 vt: [0] (t) lb: BaseLB: Statistic=P_l: max=0.29, min=0.26, sum=1.07, avg=0.27, var=0.00, stdev=0.01, nproc=4, cardinality=4 skewness=0.59, kurtosis=-1.78, npr=4, imb=0.07, num_stats=1 vt: [0] (t) lb: BaseLB: Statistic=O_l: max=0.00, min=0.00, sum=0.00, avg=0.00, var=0.00, stdev=0.00, nproc=64, cardinality=64 skewness=-0.02, kurtosis=-1.24, npr=64, imb=0.95, num_stats=2 vt: [0] (t) HierarchicalLB: loadStats: load=0.26, total=1.07, avg=0.27, I=0.07,should_lb=true, auto=true, threshold=0.9295144327590066 vt: [0] (t) lb: BaseLB: Statistic=P_l: max=20.24, min=20.23, sum=80.95, avg=20.24, var=0.00, stdev=0.00, nproc=4, cardinality=4 skewness=-0.25, kurtosis=-1.86, npr=4, imb=0.00, num_stats=2 vt: [0] (t) lb: BaseLB: Statistic=O_l: max=0.00, min=0.00, sum=0.00, avg=0.00, var=0.00, stdev=0.00, nproc=64, cardinality=64 skewness=-0.02, kurtosis=-1.24, npr=64, imb=0.95, num_stats=2 vt: [0] (t) lb: BaseLB::finalize: LB total time=0.0003299713134765625, total migration count=4 vt: [0] (t) lb: LBManager::finishedLB, phase=8 vt: [0] (t) lb: BaseLB: Statistic=P_l: max=0.27, min=0.26, sum=1.07, avg=0.27, var=0.00, stdev=0.01, nproc=4, cardinality=4 skewness=-0.60, kurtosis=-1.77, npr=4, imb=0.02, num_stats=1 vt: [0] (t) lb: BaseLB: Statistic=O_l: max=0.00, min=0.00, sum=0.00, avg=0.00, var=0.00, stdev=0.00, nproc=64, cardinality=64 skewness=-0.04, kurtosis=-1.26, npr=64, imb=0.95, num_stats=2 vt: [0] (t) HierarchicalLB: loadStats: load=0.26, total=1.07, avg=0.27, I=0.02,should_lb=false, auto=true, threshold=0.9750445632798574 vt: [0] (t) lb: BaseLB: Statistic=P_l: max=0.27, min=0.26, sum=1.07, avg=0.27, var=0.00, stdev=0.01, nproc=4, cardinality=4 skewness=-0.60, kurtosis=-1.77, npr=4, imb=0.02, num_stats=2 vt: [0] (t) lb: BaseLB: Statistic=O_l: max=0.00, min=0.00, sum=0.00, avg=0.00, var=0.00, stdev=0.00, nproc=64, cardinality=64 skewness=-0.04, kurtosis=-1.26, npr=64, imb=0.95, num_stats=2 vt: [0] (t) lb: BaseLB::finalize: LB total time=0.00025272369384765625, total migration count=0 vt: [0] (t) lb: LBManager::finishedLB, phase=9 vt: [-1] (t) trace: writeTracesFile: to_write=4008, already_written=0 vt: Total collective epochs processed: 96 vt: Total work units processed: 5063 vt: Runtime Finalizing [ OK ] LoadBalancerExplode/TestLoadBalancer.test_load_balancer_keep_last_elm/2 (19 ms) [ RUN ] LoadBalancerExplode/TestLoadBalancer.test_load_balancer_keep_last_elm/3 [ OK ] LoadBalancerExplode/TestLoadBalancer.test_load_balancer_keep_last_elm/2 (19 ms) [ RUN ] LoadBalancerExplode/TestLoadBalancer.test_load_balancer_keep_last_elm/3 [ OK ] LoadBalancerExplode/TestLoadBalancer.test_load_balancer_keep_last_elm/2 (19 ms) [ RUN ] LoadBalancerExplode/TestLoadBalancer.test_load_balancer_keep_last_elm/3 [ OK ] LoadBalancerExplode/TestLoadBalancer.test_load_balancer_keep_last_elm/2 (19 ms) [ RUN ] LoadBalancerExplode/TestLoadBalancer.test_load_balancer_keep_last_elm/3 vt: Runtime Initializing: interop=true: mode: single-thread per rank vt: Program: collection_extended (./collection_extended) vt: Running on: 4 nodes vt: Machine Hostname: rysy vt: MPI Version: 3.1 vt: MPI Max tag: 268435455 vt: Build SHA: 463bafe208e88c23b88c6de3516e4934356f04c0 vt: Build Ref: refs/heads/develop vt: Description: heads/develop-0-g463bafe208 vt: Compile-time Features Enabled: vt: Tracing Projections vt: Load Balancing for Collections vt: Message priorities vt: Memory Pooling vt: MPI access guards vt: Performance analysis diagnostics vt: Runtime Configuration: vt: Option: flag --vt_sched_num_progress on: Running MPI progress 2 times each invocation vt: Option: flag --vt_sched_progress_han on: Running MPI progress function at least every 0 handler(s) executed vt: Option: flag --vt_trace on: Tracing enabled vt: Default: Trace file "/home/cz4rs/nga/vt/build/tests/collection_extended_trace/collection_extended.0.log.gz", use --vt_trace_file to disable vt: Default: Trace directory "/home/cz4rs/nga/vt/build/tests/collection_extended_trace/", use --vt_trace_dir to disable vt: Default: Flushing traces at end of run, use --vt_trace_flush_size to disable vt: Option: flag --vt_print_no_progress on: Printing warnings when progress is stalls vt: Default: Printing verbose epoch graphs when hang detected, use --vt_epoch_graph_terse to disable vt: Option: flag --vt_epoch_graph_on_hang on: Epoch graph output enabled if hang detected vt: Default: Termination hang detection enabled by default, use --vt_no_detect_hang to disable vt: Option: flag --vt_hang_detect on: Printing stall warning every 1024 tree traversals vt: Default: SIGINT signal handling enabled by default, use --vt_no_SIGINT to disable vt: Default: SIGSEGV signal handling enabled by default, use --vt_no_SIGSEGV to disable vt: Default: std::terminate signal handling enabled by default, use --vt_no_terminate to disable vt: Default: Color output enabled, use --vt_no_color to disable vt: Default: Stack dumps enabled by default, use --vt_no_stack to disable vt: Option: flag --vt_memory_reporters on: Memory usage checker precedence: mstats,machinfo,selfstat,selfstatm,sbrk,mallinfo,getrusage,ps vt: Working memory reporters: selfstatm,mallinfo,getrusage vt: Initial memory usage: selfstatm=30.1836 mallinfo=1.06911 getrusage=30.4062 (MiB) vt: Default: Memory usage printing disabled, use --vt_print_memory_each_phase to enable vt: Default: Diagnostics are disabled by default, use --vt_diag_enable to enable vt: Option: flag --vt_max_mpi_send_size on: Splitting messages greater than 1 GiB vt: Option: flag --vt_debug_level on: Debug print mode set to terse vt: Pass-through Arguments: vt: None. All arguments handled. Testing lb GossipLB Using lb_args f=3 k=2 trials=1 inform=1 ordering=0 rollback=0 targetpole=0 vt: [0] (t) lb: LBManager::finishedLB, phase=0 vt: [0] (t) lb: BaseLB: Statistic=P_l: max=0.46, min=0.08, sum=1.08, avg=0.27, var=0.02, stdev=0.14, nproc=4, cardinality=4 skewness=0.01, kurtosis=-2.06, npr=4, imb=0.71, num_stats=1 vt: [0] (t) lb: BaseLB: Statistic=O_l: max=0.00, min=0.00, sum=0.00, avg=0.00, var=0.00, stdev=0.00, nproc=64, cardinality=64 skewness=0.06, kurtosis=-1.12, npr=64, imb=1.11, num_stats=2 vt: [0] (t) GossipLB: GossipLB::doLBStages: trial=0 final imb=0.0066 vt: [0] (t) GossipLB: GossipLB::doLBStages: chose trial=0 with imb=0.0066 vt: [0] (t) lb: BaseLB: Statistic=P_l: max=0.27, min=0.27, sum=1.08, avg=0.27, var=0.00, stdev=0.00, nproc=4, cardinality=4 skewness=0.18, kurtosis=-1.91, npr=4, imb=0.01, num_stats=2 vt: [0] (t) lb: BaseLB: Statistic=O_l: max=0.00, min=0.00, sum=0.00, avg=0.00, var=0.00, stdev=0.00, nproc=64, cardinality=64 skewness=0.06, kurtosis=-1.12, npr=64, imb=1.11, num_stats=2 vt: [0] (t) lb: BaseLB::finalize: LB total time=0.001035451889038086, total migration count=16 vt: [0] (t) lb: LBManager::finishedLB, phase=1 vt: [0] (t) lb: BaseLB: Statistic=P_l: max=0.27, min=0.26, sum=1.06, avg=0.27, var=0.00, stdev=0.00, nproc=4, cardinality=4 skewness=0.29, kurtosis=-2.10, npr=4, imb=0.02, num_stats=1 vt: [0] (t) lb: BaseLB: Statistic=O_l: max=0.00, min=0.00, sum=0.00, avg=0.00, var=0.00, stdev=0.00, nproc=64, cardinality=64 skewness=-0.01, kurtosis=-1.24, npr=64, imb=1.00, num_stats=2 vt: [0] (t) GossipLB: GossipLB::doLBStages: trial=0 final imb=0.0151 vt: [0] (t) GossipLB: GossipLB::doLBStages: chose trial=0 with imb=0.0151 vt: [0] (t) lb: BaseLB: Statistic=P_l: max=0.27, min=0.26, sum=1.06, avg=0.27, var=0.00, stdev=0.00, nproc=4, cardinality=4 skewness=0.63, kurtosis=-1.75, npr=4, imb=0.02, num_stats=2 vt: [0] (t) lb: BaseLB: Statistic=O_l: max=0.00, min=0.00, sum=0.00, avg=0.00, var=0.00, stdev=0.00, nproc=64, cardinality=64 skewness=-0.01, kurtosis=-1.24, npr=64, imb=1.00, num_stats=2 vt: [0] (t) lb: BaseLB::finalize: LB total time=0.0009288787841796875, total migration count=2 vt: [0] (t) lb: LBManager::finishedLB, phase=2 vt: [0] (t) lb: BaseLB: Statistic=P_l: max=0.27, min=0.26, sum=1.06, avg=0.27, var=0.00, stdev=0.00, nproc=4, cardinality=4 skewness=-0.47, kurtosis=-1.81, npr=4, imb=0.01, num_stats=1 vt: [0] (t) lb: BaseLB: Statistic=O_l: max=0.00, min=0.00, sum=0.00, avg=0.00, var=0.00, stdev=0.00, nproc=64, cardinality=64 skewness=0.01, kurtosis=-1.28, npr=64, imb=1.01, num_stats=2 vt: [0] (t) GossipLB: GossipLB::doLBStages: trial=0 final imb=0.0076 vt: [0] (t) GossipLB: GossipLB::doLBStages: chose trial=0 with imb=0.0076 vt: [0] (t) lb: BaseLB: Statistic=P_l: max=0.27, min=0.26, sum=1.06, avg=0.27, var=0.00, stdev=0.00, nproc=4, cardinality=4 skewness=0.12, kurtosis=-2.11, npr=4, imb=0.01, num_stats=2 vt: [0] (t) lb: BaseLB: Statistic=O_l: max=0.00, min=0.00, sum=0.00, avg=0.00, var=0.00, stdev=0.00, nproc=64, cardinality=64 skewness=0.01, kurtosis=-1.28, npr=64, imb=1.01, num_stats=2 vt: [0] (t) lb: BaseLB::finalize: LB total time=0.0009069442749023438, total migration count=2 vt: [0] (t) lb: LBManager::finishedLB, phase=3 vt: [0] (t) lb: BaseLB: Statistic=P_l: max=0.27, min=0.26, sum=1.07, avg=0.27, var=0.00, stdev=0.00, nproc=4, cardinality=4 skewness=0.19, kurtosis=-2.20, npr=4, imb=0.01, num_stats=1 vt: [0] (t) lb: BaseLB: Statistic=O_l: max=0.00, min=0.00, sum=0.00, avg=0.00, var=0.00, stdev=0.00, nproc=64, cardinality=64 skewness=0.02, kurtosis=-1.26, npr=64, imb=0.97, num_stats=2 vt: [0] (t) GossipLB: GossipLB::doLBStages: trial=0 final imb=0.0107 vt: [0] (t) GossipLB: GossipLB::doLBStages: chose trial=0 with imb=0.0107 vt: [0] (t) lb: BaseLB: Statistic=O_l: max=0.00, min=0.00, sum=0.00, avg=0.00, var=0.00, stdev=0.00, nproc=64, cardinality=64 skewness=0.02, kurtosis=-1.26, npr=64, imb=0.97, num_stats=2 vt: [0] (t) lb: BaseLB: Statistic=P_l: max=0.27, min=0.26, sum=1.07, avg=0.27, var=0.00, stdev=0.00, nproc=4, cardinality=4 skewness=0.44, kurtosis=-1.96, npr=4, imb=0.01, num_stats=2 vt: [0] (t) lb: BaseLB::finalize: LB total time=0.0008845329284667969, total migration count=1 vt: [0] (t) lb: LBManager::finishedLB, phase=4 vt: [0] (t) lb: BaseLB: Statistic=P_l: max=0.27, min=0.26, sum=1.06, avg=0.27, var=0.00, stdev=0.00, nproc=4, cardinality=4 skewness=-0.11, kurtosis=-1.91, npr=4, imb=0.01, num_stats=1 vt: [0] (t) lb: BaseLB: Statistic=O_l: max=0.00, min=0.00, sum=0.00, avg=0.00, var=0.00, stdev=0.00, nproc=64, cardinality=64 skewness=0.03, kurtosis=-1.27, npr=64, imb=1.02, num_stats=2 vt: [0] (t) GossipLB: GossipLB::doLBStages: trial=0 final imb=0.0027 vt: [0] (t) GossipLB: GossipLB::doLBStages: chose trial=0 with imb=0.0027 vt: [0] (t) lb: BaseLB: Statistic=O_l: max=0.00, min=0.00, sum=0.00, avg=0.00, var=0.00, stdev=0.00, nproc=64, cardinality=64 skewness=0.03, kurtosis=-1.27, npr=64, imb=1.02, num_stats=2 vt: [0] (t) lb: BaseLB: Statistic=P_l: max=0.27, min=0.27, sum=1.06, avg=0.27, var=0.00, stdev=0.00, nproc=4, cardinality=4 skewness=0.45, kurtosis=-1.88, npr=4, imb=0.00, num_stats=2 vt: [0] (t) lb: BaseLB::finalize: LB total time=0.0008609294891357422, total migration count=2 vt: [0] (t) lb: LBManager::finishedLB, phase=5 vt: [0] (t) lb: BaseLB: Statistic=P_l: max=0.27, min=0.27, sum=1.07, avg=0.27, var=0.00, stdev=0.00, nproc=4, cardinality=4 skewness=0.45, kurtosis=-1.85, npr=4, imb=0.01, num_stats=1 vt: [0] (t) lb: BaseLB: Statistic=O_l: max=0.00, min=0.00, sum=0.00, avg=0.00, var=0.00, stdev=0.00, nproc=64, cardinality=64 skewness=0.14, kurtosis=-0.98, npr=64, imb=1.41, num_stats=2 vt: [0] (t) GossipLB: GossipLB::doLBStages: trial=0 final imb=0.0034 vt: [0] (t) GossipLB: GossipLB::doLBStages: chose trial=0 with imb=0.0034 vt: [0] (t) lb: BaseLB: Statistic=P_l: max=0.27, min=0.27, sum=1.07, avg=0.27, var=0.00, stdev=0.00, nproc=4, cardinality=4 skewness=0.54, kurtosis=-1.82, npr=4, imb=0.00, num_stats=2 vt: [0] (t) lb: BaseLB: Statistic=O_l: max=0.00, min=0.00, sum=0.00, avg=0.00, var=0.00, stdev=0.00, nproc=64, cardinality=64 skewness=0.14, kurtosis=-0.98, npr=64, imb=1.41, num_stats=2 vt: [0] (t) lb: BaseLB::finalize: LB total time=0.0008540153503417969, total migration count=1 vt: [0] (t) lb: LBManager::finishedLB, phase=6 vt: [0] (t) lb: BaseLB: Statistic=P_l: max=0.27, min=0.26, sum=1.07, avg=0.27, var=0.00, stdev=0.00, nproc=4, cardinality=4 skewness=-0.57, kurtosis=-1.79, npr=4, imb=0.01, num_stats=1 vt: [0] (t) lb: BaseLB: Statistic=O_l: max=0.00, min=0.00, sum=0.00, avg=0.00, var=0.00, stdev=0.00, nproc=64, cardinality=64 skewness=-0.00, kurtosis=-1.24, npr=64, imb=1.03, num_stats=2 vt: [0] (t) GossipLB: GossipLB::doLBStages: trial=0 final imb=0.0025 vt: [0] (t) GossipLB: GossipLB::doLBStages: chose trial=0 with imb=0.0025 vt: [0] (t) lb: BaseLB: Statistic=P_l: max=0.27, min=0.26, sum=1.07, avg=0.27, var=0.00, stdev=0.00, nproc=4, cardinality=4 skewness=-0.65, kurtosis=-1.77, npr=4, imb=0.00, num_stats=2 vt: [0] (t) lb: BaseLB: Statistic=O_l: max=0.00, min=0.00, sum=0.00, avg=0.00, var=0.00, stdev=0.00, nproc=64, cardinality=64 skewness=-0.00, kurtosis=-1.24, npr=64, imb=1.03, num_stats=2 vt: [0] (t) lb: BaseLB::finalize: LB total time=0.0008959770202636719, total migration count=3 vt: [0] (t) lb: LBManager::finishedLB, phase=7 vt: [0] (t) lb: BaseLB: Statistic=P_l: max=0.27, min=0.26, sum=1.07, avg=0.27, var=0.00, stdev=0.00, nproc=4, cardinality=4 skewness=-0.41, kurtosis=-1.99, npr=4, imb=0.01, num_stats=1 vt: [0] (t) lb: BaseLB: Statistic=O_l: max=0.00, min=0.00, sum=0.00, avg=0.00, var=0.00, stdev=0.00, nproc=64, cardinality=64 skewness=-0.02, kurtosis=-1.24, npr=64, imb=0.99, num_stats=2 vt: [0] (t) GossipLB: GossipLB::doLBStages: trial=0 final imb=0.0031 vt: [0] (t) GossipLB: GossipLB::doLBStages: chose trial=0 with imb=0.0031 vt: [0] (t) lb: BaseLB::finalize: LB total time=0.0009024143218994141, total migration count=2 vt: [0] (t) lb: BaseLB: Statistic=O_l: max=0.00, min=0.00, sum=0.00, avg=0.00, var=0.00, stdev=0.00, nproc=64, cardinality=64 skewness=-0.02, kurtosis=-1.24, npr=64, imb=0.99, num_stats=2 vt: [0] (t) lb: BaseLB: Statistic=P_l: max=0.27, min=0.27, sum=1.07, avg=0.27, var=0.00, stdev=0.00, nproc=4, cardinality=4 skewness=0.67, kurtosis=-1.75, npr=4, imb=0.00, num_stats=2 vt: [0] (t) lb: LBManager::finishedLB, phase=8 vt: [0] (t) lb: BaseLB: Statistic=P_l: max=0.27, min=0.26, sum=1.06, avg=0.27, var=0.00, stdev=0.00, nproc=4, cardinality=4 skewness=-0.52, kurtosis=-1.79, npr=4, imb=0.00, num_stats=1 vt: [0] (t) lb: BaseLB: Statistic=O_l: max=0.00, min=0.00, sum=0.00, avg=0.00, var=0.00, stdev=0.00, nproc=64, cardinality=64 skewness=-0.03, kurtosis=-1.26, npr=64, imb=0.92, num_stats=2 vt: [0] (t) GossipLB: GossipLB::doLBStages: trial=0 final imb=0.0049 vt: [0] (t) GossipLB: GossipLB::doLBStages: chose trial=0 with imb=0.0049 vt: [0] (t) lb: BaseLB: Statistic=P_l: max=0.27, min=0.27, sum=1.06, avg=0.27, var=0.00, stdev=0.00, nproc=4, cardinality=4 skewness=0.71, kurtosis=-1.71, npr=4, imb=0.00, num_stats=2 vt: [0] (t) lb: BaseLB: Statistic=O_l: max=0.00, min=0.00, sum=0.00, avg=0.00, var=0.00, stdev=0.00, nproc=64, cardinality=64 skewness=-0.03, kurtosis=-1.26, npr=64, imb=0.92, num_stats=2 vt: [0] (t) lb: BaseLB::finalize: LB total time=0.0008869171142578125, total migration count=2 vt: [0] (t) lb: LBManager::finishedLB, phase=9 vt: [-1] (t) trace: writeTracesFile: to_write=6367, already_written=0 vt: Total collective epochs processed: 177 vt: Total work units processed: 8625 vt: Runtime Finalizing [ OK ] LoadBalancerExplode/TestLoadBalancer.test_load_balancer_keep_last_elm/3 (26 ms) [ RUN ] LoadBalancerExplode/TestLoadBalancer.test_load_balancer_keep_last_elm/4 vt: Runtime Initializing: interop=true: mode: single-thread per rank vt: Program: collection_extended (./collection_extended) vt: Running on: 4 nodes vt: Machine Hostname: rysy vt: MPI Version: 3.1 vt: MPI Max tag: 268435455 vt: Build SHA: 463bafe208e88c23b88c6de3516e4934356f04c0 vt: Build Ref: refs/heads/develop vt: Description: heads/develop-0-g463bafe208 vt: Compile-time Features Enabled: vt: Tracing Projections vt: Load Balancing for Collections vt: Message priorities vt: Memory Pooling vt: MPI access guards vt: Performance analysis diagnostics vt: Runtime Configuration: vt: Option: flag --vt_sched_num_progress on: Running MPI progress 2 times each invocation vt: Option: flag --vt_sched_progress_han on: Running MPI progress function at least every 0 handler(s) executed vt: Option: flag --vt_trace on: Tracing enabled vt: Default: Trace file "/home/cz4rs/nga/vt/build/tests/collection_extended_trace/collection_extended.0.log.gz", use --vt_trace_file to disable vt: Default: Trace directory "/home/cz4rs/nga/vt/build/tests/collection_extended_trace/", use --vt_trace_dir to disable vt: Default: Flushing traces at end of run, use --vt_trace_flush_size to disable vt: Option: flag --vt_print_no_progress on: Printing warnings when progress is stalls vt: Default: Printing verbose epoch graphs when hang detected, use --vt_epoch_graph_terse to disable vt: Option: flag --vt_epoch_graph_on_hang on: Epoch graph output enabled if hang detected vt: Default: Termination hang detection enabled by default, use --vt_no_detect_hang to disable vt: Option: flag --vt_hang_detect on: Printing stall warning every 1024 tree traversals vt: Default: SIGINT signal handling enabled by default, use --vt_no_SIGINT to disable vt: Default: SIGSEGV signal handling enabled by default, use --vt_no_SIGSEGV to disable vt: Default: std::terminate signal handling enabled by default, use --vt_no_terminate to disable vt: Default: Color output enabled, use --vt_no_color to disable vt: Default: Stack dumps enabled by default, use --vt_no_stack to disable vt: Option: flag --vt_memory_reporters on: Memory usage checker precedence: mstats,machinfo,selfstat,selfstatm,sbrk,mallinfo,getrusage,ps vt: Working memory reporters: selfstatm,mallinfo,getrusage vt: Initial memory usage: selfstatm=30.6484 mallinfo=1.07095 getrusage=30.918 (MiB) vt: Default: Memory usage printing disabled, use --vt_print_memory_each_phase to enable vt: Default: Diagnostics are disabled by default, use --vt_diag_enable to enable vt: Option: flag --vt_max_mpi_send_size on: Splitting messages greater than 1 GiB vt: Option: flag --vt_debug_level on: Debug print mode set to terse vt: Pass-through Arguments: vt: None. All arguments handled. Testing lb GreedyLB vt: [0] (t) lb: LBManager::finishedLB, phase=0 vt: [0] (t) lb: BaseLB: Statistic=P_l: max=0.46, min=0.08, sum=1.07, avg=0.27, var=0.02, stdev=0.14, nproc=4, cardinality=4 skewness=-0.01, kurtosis=-2.07, npr=4, imb=0.70, num_stats=1 vt: [0] (t) lb: BaseLB: Statistic=O_l: max=0.00, min=0.00, sum=0.00, avg=0.00, var=0.00, stdev=0.00, nproc=64, cardinality=64 skewness=0.01, kurtosis=-1.19, npr=64, imb=1.10, num_stats=2 vt: [0] (t) lb: loadStats: load=0.08, total=1.07, avg=0.27, I=0.70,should_lb=true, auto=true, threshold=0.3025994223505888 vt: [0] (t) lb: BaseLB::finalize: LB total time=0.00037860870361328125, total migration count=28 vt: [0] (t) lb: BaseLB: Statistic=O_l: max=0.00, min=0.00, sum=0.00, avg=0.00, var=0.00, stdev=0.00, nproc=64, cardinality=64 skewness=0.01, kurtosis=-1.19, npr=64, imb=1.10, num_stats=2 vt: [0] (t) lb: BaseLB: Statistic=P_l: max=0.08, min=0.06, sum=0.29, avg=0.07, var=0.00, stdev=0.01, nproc=4, cardinality=4 skewness=-0.71, kurtosis=-1.72, npr=4, imb=0.10, num_stats=2 vt: [0] (t) lb: LBManager::finishedLB, phase=1 [ OK ] LoadBalancerExplode/TestLoadBalancer.test_load_balancer_keep_last_elm/3 (26 ms) [ RUN ] LoadBalancerExplode/TestLoadBalancer.test_load_balancer_keep_last_elm/4 [ OK ] LoadBalancerExplode/TestLoadBalancer.test_load_balancer_keep_last_elm/3 (26 ms) [ RUN ] LoadBalancerExplode/TestLoadBalancer.test_load_balancer_keep_last_elm/4 [ OK ] LoadBalancerExplode/TestLoadBalancer.test_load_balancer_keep_last_elm/3 (26 ms) [ RUN ] LoadBalancerExplode/TestLoadBalancer.test_load_balancer_keep_last_elm/4 vt: [0] (t) lb: BaseLB: Statistic=P_l: max=0.28, min=0.25, sum=1.06, avg=0.27, var=0.00, stdev=0.01, nproc=4, cardinality=4 skewness=0.28, kurtosis=-1.87, npr=4, imb=0.07, num_stats=1 vt: [0] (t) lb: BaseLB: Statistic=O_l: max=0.00, min=0.00, sum=0.00, avg=0.00, var=0.00, stdev=0.00, nproc=64, cardinality=64 skewness=-0.02, kurtosis=-1.22, npr=64, imb=1.00, num_stats=2 vt: [0] (t) lb: loadStats: load=0.28, total=1.06, avg=0.27, I=0.07,should_lb=true, auto=true, threshold=0.9298561151079137 vt: [0] (t) lb: BaseLB: Statistic=O_l: max=0.00, min=0.00, sum=0.00, avg=0.00, var=0.00, stdev=0.00, nproc=64, cardinality=64 skewness=-0.02, kurtosis=-1.22, npr=64, imb=1.00, num_stats=2 vt: [0] (t) lb: BaseLB: Statistic=P_l: max=0.24, min=0.23, sum=0.95, avg=0.24, var=0.00, stdev=0.01, nproc=4, cardinality=4 skewness=-0.19, kurtosis=-2.13, npr=4, imb=0.03, num_stats=2 vt: [0] (t) lb: BaseLB::finalize: LB total time=0.0003139972686767578, total migration count=5 vt: [0] (t) lb: LBManager::finishedLB, phase=2 vt: [0] (t) lb: BaseLB: Statistic=P_l: max=0.29, min=0.25, sum=1.06, avg=0.27, var=0.00, stdev=0.02, nproc=4, cardinality=4 skewness=0.11, kurtosis=-1.95, npr=4, imb=0.08, num_stats=1 vt: [0] (t) lb: BaseLB: Statistic=O_l: max=0.00, min=0.00, sum=0.00, avg=0.00, var=0.00, stdev=0.00, nproc=64, cardinality=64 skewness=-0.03, kurtosis=-1.25, npr=64, imb=0.96, num_stats=2 vt: [0] (t) lb: loadStats: load=0.25, total=1.06, avg=0.27, I=0.08,should_lb=true, auto=true, threshold=0.9166666666666667 vt: [0] (t) lb: BaseLB: Statistic=O_l: max=0.00, min=0.00, sum=0.00, avg=0.00, var=0.00, stdev=0.00, nproc=64, cardinality=64 skewness=-0.03, kurtosis=-1.25, npr=64, imb=0.96, num_stats=2 vt: [0] (t) lb: BaseLB::finalize: LB total time=0.00029921531677246094, total migration count=8 vt: [0] (t) lb: BaseLB: Statistic=P_l: max=0.24, min=0.22, sum=0.94, avg=0.23, var=0.00, stdev=0.01, nproc=4, cardinality=4 skewness=-0.56, kurtosis=-1.81, npr=4, imb=0.02, num_stats=2 vt: [0] (t) lb: LBManager::finishedLB, phase=3 vt: [0] (t) lb: BaseLB: Statistic=P_l: max=0.29, min=0.25, sum=1.07, avg=0.27, var=0.00, stdev=0.01, nproc=4, cardinality=4 skewness=0.01, kurtosis=-1.88, npr=4, imb=0.07, num_stats=1 vt: [0] (t) lb: BaseLB: Statistic=O_l: max=0.00, min=0.00, sum=0.00, avg=0.00, var=0.00, stdev=0.00, nproc=64, cardinality=64 skewness=-0.01, kurtosis=-1.25, npr=64, imb=0.96, num_stats=2 vt: [0] (t) lb: loadStats: load=0.25, total=1.07, avg=0.27, I=0.07,should_lb=true, auto=true, threshold=0.9260665624301987 vt: [1] ------------------------------------------------------------------------------------------------------------------------ vt: [1] ------------------------------------------- Runtime Error: System Aborting! -------------------------------------------- vt: [1] ------------------------------------------------ Fatal Error on Node 1 ------------------------------------------------- vt: [1] ------------------------------------------------------------------------------------------------------------------------ vt: [1] vt: [1] Reason: Must have object vt: [1] Assertion failed: (theNodeStats()->hasObjectToMigrate(obj_id)) vt: [1] Node: 1 vt: [1] Num Nodes: 4 vt: [1] File: ../src/vt/vrt/collection/balance/baselb/baselb.cc vt: [1] Line: 228 vt: [1] Function: transferMigrations vt: [1] Code: 1 vt: [1] Build SHA: 463bafe208e88c23b88c6de3516e4934356f04c0 vt: [1] Build Ref: refs/heads/develop vt: [1] Description: heads/develop-0-g463bafe208 vt: [1] GIT Repo: clean vt: [1] Hostname: rysy vt: [1] vt: [2] ------------------------------------------------------------------------------------------------------------------------ vt: [2] ------------------------------------------- Runtime Error: System Aborting! -------------------------------------------- vt: [2] ------------------------------------------------ Fatal Error on Node 2 ------------------------------------------------- vt: [2] ------------------------------------------------------------------------------------------------------------------------ vt: [2] vt: [2] Reason: Must have object vt: [2] Assertion failed: (theNodeStats()->hasObjectToMigrate(obj_id)) vt: [2] Node: 2 vt: [2] Num Nodes: 4 vt: [2] File: ../src/vt/vrt/collection/balance/baselb/baselb.cc vt: [2] Line: 228 vt: [2] Function: transferMigrations vt: [2] Code: 1 vt: [2] Build SHA: 463bafe208e88c23b88c6de3516e4934356f04c0 vt: [2] Build Ref: refs/heads/develop vt: [2] Description: heads/develop-0-g463bafe208 vt: [2] GIT Repo: clean vt: [2] Hostname: rysy vt: [2] vt: [3] ------------------------------------------------------------------------------------------------------------------------ vt: [3] ------------------------------------------- Runtime Error: System Aborting! -------------------------------------------- vt: [3] ------------------------------------------------ Fatal Error on Node 3 ------------------------------------------------- vt: [3] ------------------------------------------------------------------------------------------------------------------------ vt: [3] vt: [3] Reason: Must have object vt: [3] Assertion failed: (theNodeStats()->hasObjectToMigrate(obj_id)) vt: [3] Node: 3 vt: [3] Num Nodes: 4 vt: [3] File: ../src/vt/vrt/collection/balance/baselb/baselb.cc vt: [3] Line: 228 vt: [3] Function: transferMigrations vt: [3] Code: 1 vt: [3] Build SHA: 463bafe208e88c23b88c6de3516e4934356f04c0 vt: [3] Build Ref: refs/heads/develop vt: [3] Description: heads/develop-0-g463bafe208 vt: [3] GIT Repo: clean vt: [3] Hostname: rysy vt: [3] vt: [2] ------------------------------------------------------------------------------------------------------------------------ vt: [2] -------------------------------------------- Dump Stack Backtrace on Node 2 -------------------------------------------- vt: [2] ------------------------------------------------------------------------------------------------------------------------ vt: [2] 0 18 0x1c0ed5e vt::debug::stack::dumpStack[abi:cxx11](int) + 46 vt: [2] 1 18 0x18fd6db vt::runtime::Runtime::output(std::__cxx11::basic_string, std::allocator >, int, bool, bool, bool) + 1771 vt: [2] 2 18 0x1850980 vt::CollectiveAnyOps<(vt::runtime::eRuntimeInstance)0>::output(std::__cxx11::basic_string, std::allocator >, int, bool, bool, bool, bool) + 288 vt: [2] 3 18 0x1850377 vt::output(std::__cxx11::basic_string, std::allocator >, int, bool, bool, bool, bool) + 199 vt: [2] 4 18 0xd42a50 _ZN2vt5debug6assert9assertOutIJEEENSt9enable_ifIXeqsr3std10tuple_sizeISt5tupleIJDpT_EEEE5valueLi0EEvE4typeEbNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEERKSF_SH_iSH_iOS7_ + 352 vt: [2] 5 18 0x1d19d61 vt::vrt::collection::lb::BaseLB::transferMigrations(vt::vrt::collection::lb::TransferMsg, std::allocator > > >*) + 561 vt: [2] 6 18 0x1a8011b vt::objgroup::dispatch::Dispatch::run(long, vt::messaging::BaseMsg*) + 571 vt: [2] 7 18 0x1a9b5cc vt::objgroup::ObjGroupManager::dispatch(vt::messaging::MsgSharedPtr >, long) + 380 vt: [2] 8 18 0x1a9c02e vt::objgroup::dispatchObjGroup(vt::messaging::MsgSharedPtr >, long) + 78 vt: [2] 9 18 0x1bb469d ./collection_extended() [0x1bb469d] + 0 vt: [2] 10 18 0x1bb4383 vt::runnable::RunnableNew::run() + 915 vt: [2] 11 18 0x1e20ddf vt::sched::BaseUnit::execute() + 31 vt: [2] 12 18 0x1bc6828 vt::sched::Scheduler::runWorkUnit(vt::sched::PriorityUnit&) + 584 vt: [2] 13 18 0x1bc6ec0 vt::sched::Scheduler::runSchedulerOnceImpl(bool) + 512 vt: [2] 14 18 0x1bc739a vt::sched::Scheduler::runSchedulerWhile(std::function) + 90 vt: [2] 15 18 0x1bc7865 vt::runSchedulerThrough(unsigned long) + 101 vt: [2] 16 18 0x1a6ddfd vt::vrt::collection::balance::LBManager::runLB(vt::objgroup::proxy::Proxy, unsigned long) + 653 vt: [2] 17 18 0x1a6e3c1 vt::vrt::collection::balance::LBManager::startLB(unsigned long, vt::vrt::collection::balance::LBType) + 1361 vt: [2] 18 18 0x18e1a31 vt::phase::PhaseManager::runHooks(vt::phase::PhaseHook) + 737 vt: [2] 19 18 0x18e1f1d vt::phase::PhaseManager::nextPhaseCollective() + 973 vt: [2] 20 18 0x154b004 vt::tests::unit::TestLoadBalancer::runTest() + 948 vt: [2] 21 18 0x1844254 void testing::internal::HandleExceptionsInMethodIfSupported(testing::Test*, void (testing::Test::*)(), char const*) + 68 vt: [2] 22 18 0x18210c8 testing::Test::Run() + 488 vt: [2] 23 18 0x1822372 testing::TestInfo::Run() + 546 vt: [2] 24 18 0x1822ac7 testing::TestSuite::Run() + 247 vt: [2] 25 18 0x1830a17 testing::internal::UnitTestImpl::RunAllTests() + 1351 vt: [2] 26 18 0x1844fa4 bool testing::internal::HandleExceptionsInMethodIfSupported(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) + 68 vt: [2] 27 18 0x183049c testing::UnitTest::Run() + 76 vt: [2] 28 18 0x7f43a42620b3 __libc_start_main + 243 vt: [2] 29 18 0xd3783e _start + 46 vt: [3] ------------------------------------------------------------------------------------------------------------------------ vt: [3] -------------------------------------------- Dump Stack Backtrace on Node 3 -------------------------------------------- vt: [3] ------------------------------------------------------------------------------------------------------------------------ vt: [3] 0 18 0x1c0ed5e vt::debug::stack::dumpStack[abi:cxx11](int) + 46 vt: [3] 1 18 0x18fd6db vt::runtime::Runtime::output(std::__cxx11::basic_string, std::allocator >, int, bool, bool, bool) + 1771 vt: [3] 2 18 0x1850980 vt::CollectiveAnyOps<(vt::runtime::eRuntimeInstance)0>::output(std::__cxx11::basic_string, std::allocator >, int, bool, bool, bool, bool) + 288 vt: [3] 3 18 0x1850377 vt::output(std::__cxx11::basic_string, std::allocator >, int, bool, bool, bool, bool) + 199 vt: [3] 4 18 0xd42a50 _ZN2vt5debug6assert9assertOutIJEEENSt9enable_ifIXeqsr3std10tuple_sizeISt5tupleIJDpT_EEEE5valueLi0EEvE4typeEbNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEERKSF_SH_iSH_iOS7_ + 352 vt: [3] 5 18 0x1d19d61 vt::vrt::collection::lb::BaseLB::transferMigrations(vt::vrt::collection::lb::TransferMsg, std::allocator > > >*) + 561 vt: [3] 6 18 0x1a8011b vt::objgroup::dispatch::Dispatch::run(long, vt::messaging::BaseMsg*) + 571 vt: [3] 7 18 0x1a9b5cc vt::objgroup::ObjGroupManager::dispatch(vt::messaging::MsgSharedPtr >, long) + 380 vt: [3] 8 18 0x1a9c02e vt::objgroup::dispatchObjGroup(vt::messaging::MsgSharedPtr >, long) + 78 vt: [3] 9 18 0x1bb469d ./collection_extended() [0x1bb469d] + 0 vt: [3] 10 18 0x1bb4383 vt::runnable::RunnableNew::run() + 915 vt: [3] 11 18 0x1e20ddf vt::sched::BaseUnit::execute() + 31 vt: [3] 12 18 0x1bc6828 vt::sched::Scheduler::runWorkUnit(vt::sched::PriorityUnit&) + 584 vt: [3] 13 18 0x1bc6ec0 vt::sched::Scheduler::runSchedulerOnceImpl(bool) + 512 vt: [3] 14 18 0x1bc739a vt::sched::Scheduler::runSchedulerWhile(std::function) + 90 vt: [3] 15 18 0x1bc7865 vt::runSchedulerThrough(unsigned long) + 101 vt: [3] 16 18 0x1a6ddfd vt::vrt::collection::balance::LBManager::runLB(vt::objgroup::proxy::Proxy, unsigned long) + 653 vt: [3] 17 18 0x1a6e3c1 vt::vrt::collection::balance::LBManager::startLB(unsigned long, vt::vrt::collection::balance::LBType) + 1361 vt: [3] 18 18 0x18e1a31 vt::phase::PhaseManager::runHooks(vt::phase::PhaseHook) + 737 vt: [3] 19 18 0x18e1f1d vt::phase::PhaseManager::nextPhaseCollective() + 973 vt: [3] 20 18 0x154b004 vt::tests::unit::TestLoadBalancer::runTest() + 948 vt: [3] 21 18 0x1844254 void testing::internal::HandleExceptionsInMethodIfSupported(testing::Test*, void (testing::Test::*)(), char const*) + 68 vt: [3] 22 18 0x18210c8 testing::Test::Run() + 488 vt: [3] 23 18 0x1822372 testing::TestInfo::Run() + 546 vt: [3] 24 18 0x1822ac7 testing::TestSuite::Run() + 247 vt: [3] 25 18 0x1830a17 testing::internal::UnitTestImpl::RunAllTests() + 1351 vt: [3] 26 18 0x1844fa4 bool testing::internal::HandleExceptionsInMethodIfSupported(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) + 68 vt: [3] 27 18 0x183049c testing::UnitTest::Run() + 76 vt: [3] 28 18 0x7fe0fe7210b3 __libc_start_main + 243 vt: [3] 29 18 0xd3783e _start + 46 vt: [1] ------------------------------------------------------------------------------------------------------------------------ vt: [1] -------------------------------------------- Dump Stack Backtrace on Node 1 -------------------------------------------- vt: [1] ------------------------------------------------------------------------------------------------------------------------ vt: [1] 0 18 0x1c0ed5e vt::debug::stack::dumpStack[abi:cxx11](int) + 46 vt: [1] 1 18 0x18fd6db vt::runtime::Runtime::output(std::__cxx11::basic_string, std::allocator >, int, bool, bool, bool) + 1771 vt: [1] 2 18 0x1850980 vt::CollectiveAnyOps<(vt::runtime::eRuntimeInstance)0>::output(std::__cxx11::basic_string, std::allocator >, int, bool, bool, bool, bool) + 288 vt: [1] 3 18 0x1850377 vt::output(std::__cxx11::basic_string, std::allocator >, int, bool, bool, bool, bool) + 199 vt: [1] 4 18 0xd42a50 _ZN2vt5debug6assert9assertOutIJEEENSt9enable_ifIXeqsr3std10tuple_sizeISt5tupleIJDpT_EEEE5valueLi0EEvE4typeEbNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEERKSF_SH_iSH_iOS7_ + 352 vt: [1] 5 18 0x1d19d61 vt::vrt::collection::lb::BaseLB::transferMigrations(vt::vrt::collection::lb::TransferMsg, std::allocator > > >*) + 561 vt: [1] 6 18 0x1a8011b vt::objgroup::dispatch::Dispatch::run(long, vt::messaging::BaseMsg*) + 571 vt: [1] 7 18 0x1a9b5cc vt::objgroup::ObjGroupManager::dispatch(vt::messaging::MsgSharedPtr >, long) + 380 vt: [1] 8 18 0x1a9c02e vt::objgroup::dispatchObjGroup(vt::messaging::MsgSharedPtr >, long) + 78 vt: [1] 9 18 0x1bb469d ./collection_extended() [0x1bb469d] + 0 vt: [1] 10 18 0x1bb4383 vt::runnable::RunnableNew::run() + 915 vt: [1] 11 18 0x1e20ddf vt::sched::BaseUnit::execute() + 31 vt: [1] 12 18 0x1bc6828 vt::sched::Scheduler::runWorkUnit(vt::sched::PriorityUnit&) + 584 vt: [1] 13 18 0x1bc6ec0 vt::sched::Scheduler::runSchedulerOnceImpl(bool) + 512 vt: [1] 14 18 0x1bc739a vt::sched::Scheduler::runSchedulerWhile(std::function) + 90 vt: [1] 15 18 0x1bc7865 vt::runSchedulerThrough(unsigned long) + 101 vt: [1] 16 18 0x1a6ddfd vt::vrt::collection::balance::LBManager::runLB(vt::objgroup::proxy::Proxy, unsigned long) + 653 vt: [1] 17 18 0x1a6e3c1 vt::vrt::collection::balance::LBManager::startLB(unsigned long, vt::vrt::collection::balance::LBType) + 1361 vt: [1] 18 18 0x18e1a31 vt::phase::PhaseManager::runHooks(vt::phase::PhaseHook) + 737 vt: [1] 19 18 0x18e1f1d vt::phase::PhaseManager::nextPhaseCollective() + 973 vt: [1] 20 18 0x154b004 vt::tests::unit::TestLoadBalancer::runTest() + 948 vt: [1] 21 18 0x1844254 void testing::internal::HandleExceptionsInMethodIfSupported(testing::Test*, void (testing::Test::*)(), char const*) + 68 vt: [1] 22 18 0x18210c8 testing::Test::Run() + 488 vt: [1] 23 18 0x1822372 testing::TestInfo::Run() + 546 vt: [1] 24 18 0x1822ac7 testing::TestSuite::Run() + 247 vt: [1] 25 18 0x1830a17 testing::internal::UnitTestImpl::RunAllTests() + 1351 vt: [1] 26 18 0x1844fa4 bool testing::internal::HandleExceptionsInMethodIfSupported(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) + 68 vt: [1] 27 18 0x183049c testing::UnitTest::Run() + 76 vt: [1] 28 18 0x7fd9049d80b3 __libc_start_main + 243 vt: [1] 29 18 0xd3783e _start + 46 vt: [2] ------------------------------------------------------------------------------------------------------------------------ vt: [2] ------------------------------------------- Runtime Error: System Aborting! -------------------------------------------- vt: [2] ------------------------------------------------ Fatal Error on Node 2 ------------------------------------------------- vt: [2] ------------------------------------------------------------------------------------------------------------------------ vt: [2] vt: [2] Message: Assertion Failed vt: [2] vt: [3] ------------------------------------------------------------------------------------------------------------------------ vt: [3] ------------------------------------------- Runtime Error: System Aborting! -------------------------------------------- vt: [3] ------------------------------------------------ Fatal Error on Node 3 ------------------------------------------------- vt: [3] ------------------------------------------------------------------------------------------------------------------------ vt: [3] vt: [3] Message: Assertion Failed vt: [3] vt: [1] ------------------------------------------------------------------------------------------------------------------------ vt: [1] ------------------------------------------- Runtime Error: System Aborting! -------------------------------------------- vt: [1] ------------------------------------------------ Fatal Error on Node 1 ------------------------------------------------- vt: [1] ------------------------------------------------------------------------------------------------------------------------ vt: [1] vt: [1] Message: Assertion Failed vt: [1] vt: [2] ------------------------------------------------------------------------------------------------------------------------ vt: [2] -------------------------------------------- Dump Stack Backtrace on Node 2 -------------------------------------------- vt: [2] ------------------------------------------------------------------------------------------------------------------------ vt: [2] 0 18 0x1c0ed5e vt::debug::stack::dumpStack[abi:cxx11](int) + 46 vt: [2] 1 18 0x18fd6db vt::runtime::Runtime::output(std::__cxx11::basic_string, std::allocator >, int, bool, bool, bool) + 1771 vt: [2] 2 18 0x18fcf84 vt::runtime::Runtime::abort(std::__cxx11::basic_string, std::allocator >, int) + 180 vt: [2] 3 18 0x1850815 vt::CollectiveAnyOps<(vt::runtime::eRuntimeInstance)0>::abort(std::__cxx11::basic_string, std::allocator >, int) + 213 vt: [2] 4 18 0x1850264 vt::abort(std::__cxx11::basic_string, std::allocator >, int) + 148 vt: [2] 5 18 0x18509f3 vt::CollectiveAnyOps<(vt::runtime::eRuntimeInstance)0>::output(std::__cxx11::basic_string, std::allocator >, int, bool, bool, bool, bool) + 403 vt: [2] 6 18 0x1850377 vt::output(std::__cxx11::basic_string, std::allocator >, int, bool, bool, bool, bool) + 199 vt: [2] 7 18 0xd42a50 _ZN2vt5debug6assert9assertOutIJEEENSt9enable_ifIXeqsr3std10tuple_sizeISt5tupleIJDpT_EEEE5valueLi0EEvE4typeEbNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEERKSF_SH_iSH_iOS7_ + 352 vt: [2] 8 18 0x1d19d61 vt::vrt::collection::lb::BaseLB::transferMigrations(vt::vrt::collection::lb::TransferMsg, std::allocator > > >*) + 561 vt: [2] 9 18 0x1a8011b vt::objgroup::dispatch::Dispatch::run(long, vt::messaging::BaseMsg*) + 571 vt: [2] 10 18 0x1a9b5cc vt::objgroup::ObjGroupManager::dispatch(vt::messaging::MsgSharedPtr >, long) + 380 vt: [2] 11 18 0x1a9c02e vt::objgroup::dispatchObjGroup(vt::messaging::MsgSharedPtr >, long) + 78 vt: [2] 12 18 0x1bb469d ./collection_extended() [0x1bb469d] + 0 vt: [2] 13 18 0x1bb4383 vt::runnable::RunnableNew::run() + 915 vt: [2] 14 18 0x1e20ddf vt::sched::BaseUnit::execute() + 31 vt: [2] 15 18 0x1bc6828 vt::sched::Scheduler::runWorkUnit(vt::sched::PriorityUnit&) + 584 vt: [2] 16 18 0x1bc6ec0 vt::sched::Scheduler::runSchedulerOnceImpl(bool) + 512 vt: [2] 17 18 0x1bc739a vt::sched::Scheduler::runSchedulerWhile(std::function) + 90 vt: [2] 18 18 0x1bc7865 vt::runSchedulerThrough(unsigned long) + 101 vt: [2] 19 18 0x1a6ddfd vt::vrt::collection::balance::LBManager::runLB(vt::objgroup::proxy::Proxy, unsigned long) + 653 vt: [2] 20 18 0x1a6e3c1 vt::vrt::collection::balance::LBManager::startLB(unsigned long, vt::vrt::collection::balance::LBType) + 1361 vt: [2] 21 18 0x18e1a31 vt::phase::PhaseManager::runHooks(vt::phase::PhaseHook) + 737 vt: [2] 22 18 0x18e1f1d vt::phase::PhaseManager::nextPhaseCollective() + 973 vt: [2] 23 18 0x154b004 vt::tests::unit::TestLoadBalancer::runTest() + 948 vt: [2] 24 18 0x1844254 void testing::internal::HandleExceptionsInMethodIfSupported(testing::Test*, void (testing::Test::*)(), char const*) + 68 vt: [2] 25 18 0x18210c8 testing::Test::Run() + 488 vt: [2] 26 18 0x1822372 testing::TestInfo::Run() + 546 vt: [2] 27 18 0x1822ac7 testing::TestSuite::Run() + 247 vt: [2] 28 18 0x1830a17 testing::internal::UnitTestImpl::RunAllTests() + 1351 vt: [2] 29 18 0x1844fa4 bool testing::internal::HandleExceptionsInMethodIfSupported(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) + 68 vt: [2] 30 18 0x183049c testing::UnitTest::Run() + 76 vt: [2] 31 18 0x7f43a42620b3 __libc_start_main + 243 vt: [2] 32 18 0xd3783e _start + 46 vt: [3] ------------------------------------------------------------------------------------------------------------------------ vt: [3] -------------------------------------------- Dump Stack Backtrace on Node 3 -------------------------------------------- vt: [3] ------------------------------------------------------------------------------------------------------------------------ vt: [3] 0 18 0x1c0ed5e vt::debug::stack::dumpStack[abi:cxx11](int) + 46 vt: [3] 1 18 0x18fd6db vt::runtime::Runtime::output(std::__cxx11::basic_string, std::allocator >, int, bool, bool, bool) + 1771 vt: [3] 2 18 0x18fcf84 vt::runtime::Runtime::abort(std::__cxx11::basic_string, std::allocator >, int) + 180 vt: [3] 3 18 0x1850815 vt::CollectiveAnyOps<(vt::runtime::eRuntimeInstance)0>::abort(std::__cxx11::basic_string, std::allocator >, int) + 213 vt: [3] 4 18 0x1850264 vt::abort(std::__cxx11::basic_string, std::allocator >, int) + 148 vt: [3] 5 18 0x18509f3 vt::CollectiveAnyOps<(vt::runtime::eRuntimeInstance)0>::output(std::__cxx11::basic_string, std::allocator >, int, bool, bool, bool, bool) + 403 vt: [3] 6 18 0x1850377 vt::output(std::__cxx11::basic_string, std::allocator >, int, bool, bool, bool, bool) + 199 vt: [3] 7 18 0xd42a50 _ZN2vt5debug6assert9assertOutIJEEENSt9enable_ifIXeqsr3std10tuple_sizeISt5tupleIJDpT_EEEE5valueLi0EEvE4typeEbNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEERKSF_SH_iSH_iOS7_ + 352 vt: [3] 8 18 0x1d19d61 vt::vrt::collection::lb::BaseLB::transferMigrations(vt::vrt::collection::lb::TransferMsg, std::allocator > > >*) + 561 vt: [3] 9 18 0x1a8011b vt::objgroup::dispatch::Dispatch::run(long, vt::messaging::BaseMsg*) + 571 vt: [3] 10 18 0x1a9b5cc vt::objgroup::ObjGroupManager::dispatch(vt::messaging::MsgSharedPtr >, long) + 380 vt: [3] 11 18 0x1a9c02e vt::objgroup::dispatchObjGroup(vt::messaging::MsgSharedPtr >, long) + 78 vt: [3] 12 18 0x1bb469d ./collection_extended() [0x1bb469d] + 0 vt: [3] 13 18 0x1bb4383 vt::runnable::RunnableNew::run() + 915 vt: [3] 14 18 0x1e20ddf vt::sched::BaseUnit::execute() + 31 vt: [3] 15 18 0x1bc6828 vt::sched::Scheduler::runWorkUnit(vt::sched::PriorityUnit&) + 584 vt: [3] 16 18 0x1bc6ec0 vt::sched::Scheduler::runSchedulerOnceImpl(bool) + 512 vt: [3] 17 18 0x1bc739a vt::sched::Scheduler::runSchedulerWhile(std::function) + 90 vt: [3] 18 18 0x1bc7865 vt::runSchedulerThrough(unsigned long) + 101 vt: [3] 19 18 0x1a6ddfd vt::vrt::collection::balance::LBManager::runLB(vt::objgroup::proxy::Proxy, unsigned long) + 653 vt: [3] 20 18 0x1a6e3c1 vt::vrt::collection::balance::LBManager::startLB(unsigned long, vt::vrt::collection::balance::LBType) + 1361 vt: [3] 21 18 0x18e1a31 vt::phase::PhaseManager::runHooks(vt::phase::PhaseHook) + 737 vt: [3] 22 18 0x18e1f1d vt::phase::PhaseManager::nextPhaseCollective() + 973 vt: [3] 23 18 0x154b004 vt::tests::unit::TestLoadBalancer::runTest() + 948 vt: [3] 24 18 0x1844254 void testing::internal::HandleExceptionsInMethodIfSupported(testing::Test*, void (testing::Test::*)(), char const*) + 68 vt: [3] 25 18 0x18210c8 testing::Test::Run() + 488 vt: [3] 26 18 0x1822372 testing::TestInfo::Run() + 546 vt: [3] 27 18 0x1822ac7 testing::TestSuite::Run() + 247 vt: [3] 28 18 0x1830a17 testing::internal::UnitTestImpl::RunAllTests() + 1351 vt: [3] 29 18 0x1844fa4 bool testing::internal::HandleExceptionsInMethodIfSupported(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) + 68 vt: [3] 30 18 0x183049c testing::UnitTest::Run() + 76 vt: [3] 31 18 0x7fe0fe7210b3 __libc_start_main + 243 vt: [3] 32 18 0xd3783e _start + 46 vt: [1] ------------------------------------------------------------------------------------------------------------------------ vt: [1] -------------------------------------------- Dump Stack Backtrace on Node 1 -------------------------------------------- vt: [1] ------------------------------------------------------------------------------------------------------------------------ vt: [1] 0 18 0x1c0ed5e vt::debug::stack::dumpStack[abi:cxx11](int) + 46 vt: [1] 1 18 0x18fd6db vt::runtime::Runtime::output(std::__cxx11::basic_string, std::allocator >, int, bool, bool, bool) + 1771 vt: [1] 2 18 0x18fcf84 vt::runtime::Runtime::abort(std::__cxx11::basic_string, std::allocator >, int) + 180 vt: [1] 3 18 0x1850815 vt::CollectiveAnyOps<(vt::runtime::eRuntimeInstance)0>::abort(std::__cxx11::basic_string, std::allocator >, int) + 213 vt: [1] 4 18 0x1850264 vt::abort(std::__cxx11::basic_string, std::allocator >, int) + 148 vt: [1] 5 18 0x18509f3 vt::CollectiveAnyOps<(vt::runtime::eRuntimeInstance)0>::output(std::__cxx11::basic_string, std::allocator >, int, bool, bool, bool, bool) + 403 vt: [1] 6 18 0x1850377 vt::output(std::__cxx11::basic_string, std::allocator >, int, bool, bool, bool, bool) + 199 vt: [1] 7 18 0xd42a50 _ZN2vt5debug6assert9assertOutIJEEENSt9enable_ifIXeqsr3std10tuple_sizeISt5tupleIJDpT_EEEE5valueLi0EEvE4typeEbNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEERKSF_SH_iSH_iOS7_ + 352 vt: [1] 8 18 0x1d19d61 vt::vrt::collection::lb::BaseLB::transferMigrations(vt::vrt::collection::lb::TransferMsg, std::allocator > > >*) + 561 vt: [1] 9 18 0x1a8011b vt::objgroup::dispatch::Dispatch::run(long, vt::messaging::BaseMsg*) + 571 vt: [1] 10 18 0x1a9b5cc vt::objgroup::ObjGroupManager::dispatch(vt::messaging::MsgSharedPtr >, long) + 380 vt: [1] 11 18 0x1a9c02e vt::objgroup::dispatchObjGroup(vt::messaging::MsgSharedPtr >, long) + 78 vt: [1] 12 18 0x1bb469d ./collection_extended() [0x1bb469d] + 0 vt: [1] 13 18 0x1bb4383 vt::runnable::RunnableNew::run() + 915 vt: [1] 14 18 0x1e20ddf vt::sched::BaseUnit::execute() + 31 vt: [1] 15 18 0x1bc6828 vt::sched::Scheduler::runWorkUnit(vt::sched::PriorityUnit&) + 584 vt: [1] 16 18 0x1bc6ec0 vt::sched::Scheduler::runSchedulerOnceImpl(bool) + 512 vt: [1] 17 18 0x1bc739a vt::sched::Scheduler::runSchedulerWhile(std::function) + 90 vt: [1] 18 18 0x1bc7865 vt::runSchedulerThrough(unsigned long) + 101 vt: [1] 19 18 0x1a6ddfd vt::vrt::collection::balance::LBManager::runLB(vt::objgroup::proxy::Proxy, unsigned long) + 653 vt: [1] 20 18 0x1a6e3c1 vt::vrt::collection::balance::LBManager::startLB(unsigned long, vt::vrt::collection::balance::LBType) + 1361 vt: [1] 21 18 0x18e1a31 vt::phase::PhaseManager::runHooks(vt::phase::PhaseHook) + 737 vt: [1] 22 18 0x18e1f1d vt::phase::PhaseManager::nextPhaseCollective() + 973 vt: [1] 23 18 0x154b004 vt::tests::unit::TestLoadBalancer::runTest() + 948 vt: [1] 24 18 0x1844254 void testing::internal::HandleExceptionsInMethodIfSupported(testing::Test*, void (testing::Test::*)(), char const*) + 68 vt: [1] 25 18 0x18210c8 testing::Test::Run() + 488 vt: [1] 26 18 0x1822372 testing::TestInfo::Run() + 546 vt: [1] 27 18 0x1822ac7 testing::TestSuite::Run() + 247 vt: [1] 28 18 0x1830a17 testing::internal::UnitTestImpl::RunAllTests() + 1351 vt: [1] 29 18 0x1844fa4 bool testing::internal::HandleExceptionsInMethodIfSupported(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) + 68 vt: [1] 30 18 0x183049c testing::UnitTest::Run() + 76 vt: [1] 31 18 0x7fd9049d80b3 __libc_start_main + 243 vt: [1] 32 18 0xd3783e _start + 46 =================================================================================== = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES = PID 18380 RUNNING AT rysy = EXIT CODE: 133 = CLEANING UP REMAINING PROCESSES = YOU CAN IGNORE THE BELOW CLEANUP MESSAGES =================================================================================== YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Trace/breakpoint trap (signal 5) This typically refers to a problem with your application. Please see the FAQ page for debugging suggestions Start 298: vt:*/TestLoadBalancer.test_load_balancer_keep_last_elm/*_proc_8 Test #298: vt:*/TestLoadBalancer.test_load_balancer_keep_last_elm/*_proc_8 ... Passed 1.50 sec ```

nlslatt commented 3 years ago

I have concerns about this chunk of code:

  theCollective()->scatter<GreedyLBTypes::ObjIDType,recvObjsHan>(
    max_bytes*load.size(),max_bytes,nullptr,[&](NodeType node, void* ptr){
      auto ptr_out = reinterpret_cast<GreedyLBTypes::ObjIDType*>(ptr);
      auto const& proc = node_transfer[node];
      auto const& rec_size = proc.size();
      ptr_out->id = rec_size;
      for (size_t i = 0; i < rec_size; i++) {
        *(ptr_out + i + 1) = proc[i];
      }
    }
  );

I'm concerned that I may have borked GreedyLB when I removed temporary IDs from the load balancers. Can @lifflander or @PhilMiller take a look and let me know how they think this was supposed to work?

nlslatt commented 3 years ago

I have concerns about this chunk of code:

  theCollective()->scatter<GreedyLBTypes::ObjIDType,recvObjsHan>(
    max_bytes*load.size(),max_bytes,nullptr,[&](NodeType node, void* ptr){
      auto ptr_out = reinterpret_cast<GreedyLBTypes::ObjIDType*>(ptr);
      auto const& proc = node_transfer[node];
      auto const& rec_size = proc.size();
      ptr_out->id = rec_size;
      for (size_t i = 0; i < rec_size; i++) {
        *(ptr_out + i + 1) = proc[i];
      }
    }
  );

I'm concerned that I may have borked GreedyLB when I removed temporary IDs from the load balancers. Can @lifflander or @PhilMiller take a look and let me know how they think this was supposed to work?

@lifflander and I discussed this and decided this chunk of code is correct.

nlslatt commented 3 years ago

This is a lot more difficult to reproduce on my Mac than I expected given how frequently I feel like it shows up on CI. It shows up for me on Mac best with 4 ranks. Maybe it would show up more frequently in Docker. You can run only this GreedyLB test by running something like below, but I recommend redirecting the output to file.

mpiexec -n 4 ./tests/collection_extended --gtest_filter=*/TestLoadBalancer.test_load_balancer_keep_last_elm/4 --gtest_repeat=1000
lifflander commented 3 years ago

@nlslatt I actually just commented out the other LBs to get it to reproduce easily, but I should have thought of that.

lifflander commented 3 years ago

So I've enabled address sanitizer and it comes out clean even when it fails.

However, I was able to print the data and it indeed looks corrupted:

vt: [0] (t) lb: LBManager::finishedLB, phase=2
vt: [0] (t) lb: BaseLB: Statistic=P_l:  max=3.82, min=3.35, sum=7.17, avg=3.59, var=0.05, stdev=0.23, nproc=2, cardinality=2 skewness=0.00, kurtosis=-2.75, npr=2, imb=0.06, num_stats=1
vt: [0] (t) lb: BaseLB: Statistic=O_l:  max=0.00, min=0.00, sum=0.01, avg=0.00, var=0.00, stdev=0.00, nproc=64, cardinality=64 skewness=2.71, kurtosis=7.53, npr=64, imb=6.42, num_stats=2
vt: [0] (t) lb: loadStats: load=3.35, total=7.17, avg=3.59, I=0.06,should_lb=true, auto=true, threshold=0.9350528485009639
vt: [1] (t) lb: migrateObjectTo, obj_id=25770721283, home=-24827, from=1, to=25687, found=false
vt: [1] (t) lb: transferMigrations: obj_id=(25770721283,-24827,1), to_node=25687
vt: [0] (t) lb: BaseLB: Statistic=P_l:  max=3.35, min=2.99, sum=6.34, avg=3.17, var=0.03, stdev=0.18, nproc=2, cardinality=2 skewness=0.00, kurtosis=-2.75, npr=2, imb=0.06, num_stats=2