domschrei / mallob

Malleable Load Balancer. Massively Parallel Logic Backend. Award-winning SAT solving for the cloud.
GNU Lesser General Public License v3.0
57 stars 15 forks source link

Floating point exception on single instance #2

Closed bratelefant closed 4 years ago

bratelefant commented 4 years ago

Hi Dom and congrats for your great SAT Race 2020 results ;)

I'm trying to use mallob for testing on a single node in mono-mode on a single instance using mpirun -np 1 build/mallob -mono=../myhardsatproblem.cnf and I get this error message. Not sure this is a real issue, but maybe a missing hint in how to run mallob.

--------------------------------------------------------------------------
[[47230,1],0]: A high-performance Open MPI point-to-point messaging module
was unable to find any relevant network interfaces:

Module: OpenFabrics (openib)
  Host: v2201911108090102223

Another transport will be used instead, although this may result in
lower performance.

NOTE: You can disable this warning by setting the MCA parameter
btl_base_warn_component_unused to 0.
--------------------------------------------------------------------------
c 0.124 0 Program options: T=0, appmode=fork, ba=4, bm=ed, c=0, cbbs=1500, cbdf=1.0, cfhl=0, cpuh-per-instance=0, g=0.0, icpr=0.8, jc=0, l=1.0, lbc=0, log=., mcl=0, md=0, mono=/home/cwahle/mark8.cnf, p=0.01, r=bisec, rto=0, s=1.0, s2f, satsolver=1, sleep=100, t=2, td=0.01, time-per-instance=0, v=2
[v2201911108090102223:10355] *** Process received signal ***
[v2201911108090102223:10355] Signal: Floating point exception (8)
[v2201911108090102223:10355] Signal code: Integer divide-by-zero (1)
[v2201911108090102223:10355] Failing at address: 0x56243c544d05
[v2201911108090102223:10355] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0x12890)[0x7fa8b3d33890]
[v2201911108090102223:10355] [ 1] build/mallob(_ZN19EventDrivenBalancer13getChildRanksEb+0xd9)[0x56243c544d05]
[v2201911108090102223:10355] [ 2] build/mallob(_ZN19EventDrivenBalancerC1ERP19ompi_communicator_tR10Parameters+0x142)[0x56243c5439fe]
[v2201911108090102223:10355] [ 3] build/mallob(_ZN11JobDatabaseC1ER10ParametersRP19ompi_communicator_t+0x3b6)[0x56243c51a2d0]
[v2201911108090102223:10355] [ 4] build/mallob(_ZN6WorkerC1EP19ompi_communicator_tR10ParametersRKSt3setIiSt4lessIiESaIiEE+0x8a)[0x56243c4f471c]
[v2201911108090102223:10355] [ 5] build/mallob(_Z19doWorkerNodeProgramRP19ompi_communicator_tR10ParametersRKSt3setIiSt4lessIiESaIiEE+0x57)[0x56243c4eea5c]
[v2201911108090102223:10355] [ 6] build/mallob(main+0x75b)[0x56243c4ef222]
[v2201911108090102223:10355] [ 7] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xe7)[0x7fa8b28feb97]
[v2201911108090102223:10355] [ 8] build/mallob(_start+0x2a)[0x56243c4ee71a]
[v2201911108090102223:10355] *** End of error message ***
bratelefant commented 4 years ago

ok, solved by setting -np 2

domschrei commented 4 years ago

Hey there, thank you! Yes, this is a bug – I never thought about testing mallob-mono on just a single node ;) It should be fixed now. Also keep in mind that you can use oversubscription of MPI to launch multiple logical nodes of mallob even when running it on a single machine, i.e., a notebook or a desktop PC. Cheers, Dominik