bayesiancook / pbmpi

phylobayes mpi
GNU General Public License v2.0
23 stars 9 forks source link

error: negative time when running on computing cluster #14

Closed emilhaegglund closed 3 years ago

emilhaegglund commented 4 years ago

Hi, I managed to compile the pbmpi using gcc v9.2.0 and openmpi v3.1.3 without warnings on a computing cluster, but when I try to run it, it fails with the following error-message.

mpirun -np 4 ./pb_mpi -d concatenated.phy chain1

model:
stick-breaking Dirichlet process mixture (cat)
exchangeabilities estimated from data (gtr)
discrete gamma distribution of rates across sites (4 categories)

read data from file : concatenated.phy
number of taxa  : 68
number of sites : 19037
number of states: 20

chain name : chain1
error : negative time : 0       0       0
--------------------------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:

  Process name: [[47863,1],0]
  Exit code:    1
--------------------------------------------------------------------------

I can run it on my local computer. Any ideas?

Cheers, Emil

orliac commented 4 years ago

We observe a similar issue with GCC 8.3.0 and mpich 3.3.2

bayesiancook commented 4 years ago

hi,

sorry for the slow reply.

hopefully this should work now. apparently, the older version of the stopwatch (measuring running time per cycle) was not working anymore (depending on the system or the compiler version).

orliac commented 4 years ago

Thank you. I could also fix the issue by returning TotalTime from Chrono::GetTime(). Anyway, we’ll check this out and let you know if anything goes wrong.

Cheers, Etienne De : bayesiancook notifications@github.com Envoyé : jeudi, 30 juillet 2020 13:56 À : bayesiancook/pbmpi pbmpi@noreply.github.com Cc : Etienne Orliac etienne.orliac@unil.ch; Comment comment@noreply.github.com Objet : Re: [bayesiancook/pbmpi] error: negative time when running on computing cluster (#14)

hi,

sorry for the slow reply.

hopefully this should work now. apparently, the older version of the stopwatch (measuring running time per cycle) was not working anymore (depending on the system or the compiler version).

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/bayesiancook/pbmpi/issues/14#issuecomment-666320568, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AIYBAHZGV6LSMBNMLS4NKY3R6FNTNANCNFSM4K44BCSQ.