coin-or / CHiPPS-ALPS

This is the Abstract Library for Parallel Search (ALPS), the abstract base layer of the COIN-OR High Performance Parallel Search framework.
Eclipse Public License 1.0
9 stars 8 forks source link

Abc test assertion error #7

Open aykutbulut opened 5 years ago

aykutbulut commented 5 years ago

I get following assertion error when I run the Abc example (in test directory) in parallel.

unitTest: /home/aykut/research/conic/software/disco/CHiPPS-ALPS/Alps/src/AlpsSubTree.cpp:1261: AlpsReturnStatus AlpsSubTree::exploreUnitWork(bool, int, double, AlpsExitStatus&, int&, int&, int&, int&, int&, bool&): Assertion `nodePool_->getNumKnowledges() + diveNodePool_->getNumKnowledges() == numNodesCandidate + numNodesPartial' failed.

See following for the complete output.

laptop:test$ pwd
laptop:test$ mpirun -n 2 ./unitTest -param ../examples/Abc/abc.par
Reading in ALPS parameters ...
Reading in ABC parameters ...
==  Welcome to the Abstract Library for Parallel Search (ALPS) 
==  Copyright 2000-2018 Lehigh University and others 
==  All Rights Reserved. 
==  Distributed under the Eclipse Public License 1.0 
==  Version: Trunk (unstable) 
==  Build Date: Jan 12 2019
Alps0030I Data file: flugpl.mps
Coin0008I FLUGPL read with 0 errors
Problem = flugpl
Log file = flugpl.log
Alps0035I Using 1 hub
Alps0106I The memory size of a node is about 1392 bytes
Alps0165I Starting spiral initialization
Alps0148I Master[0] is creating nodes (2) for its hubs during rampup
Abc0010I Process[0]: after 0 nodes, 0 on tree, 1e+80 best solution, best possible 1e+75
Alps0166I Completed spiral initialization
Alps0076I Master[0] initially balances the hubs every 0.3000 seconds
Abc0010I Process[1]: after 0 nodes, 2 on tree, 1e+80 best solution, best possible 1167185.7
unitTest: /home/aykut/research/conic/software/disco/CHiPPS-ALPS/Alps/src/AlpsSubTree.cpp:1261: AlpsReturnStatus AlpsSubTree::exploreUnitWork(bool, int, double, AlpsExitStatus&, int&, int&, int&, int&, int&, bool&): Assertion `nodePool_->getNumKnowledges() + diveNodePool_->getNumKnowledges() == numNodesCandidate + numNodesPartial' failed.
[laptop:18273] *** Process received signal ***
[laptop:18273] Signal: Aborted (6)
[laptop:18273] Signal code:  (-6)
[laptop:18273] [ 0] /lib64/[0x7fadd7c04030]
[laptop:18273] [ 1] /lib64/[0x7fadd7a6353f]
[laptop:18273] [ 2] /lib64/[0x7fadd7a4d895]
[laptop:18273] [ 3] /lib64/[0x7fadd7a4d769]
[laptop:18273] [ 4] /lib64/[0x7fadd7a5b9f6]
[laptop:18273] [ 5] ./unitTest[0x457525]
[laptop:18273] [ 6] ./unitTest[0x44621f]
[laptop:18273] [ 7] ./unitTest[0x438596]
[laptop:18273] [ 8] ./unitTest[0x441e80]
[laptop:18273] [ 9] ./unitTest[0x441dfe]
[laptop:18273] [10] ./unitTest[0x410c7b]
[laptop:18273] [11] /lib64/[0x7fadd7a4f413]
[laptop:18273] [12] ./unitTest[0x40d83e]
[laptop:18273] *** End of error message ***
Alps0094I Node 23: left 10, msg(s 1, r 1), inter(0, 0.30000), npt 0.00644, unit 30, no sol, 0 sec.
mpirun noticed that process rank 1 with PID 0 on node laptop exited on signal 6 (Aborted).