NBISweden / MrBayes

MrBayes is a program for Bayesian inference and model choice across a wide range of phylogenetic and evolutionary models. For documentation and downloading the program, please see the home page:
http://NBISweden.github.io/MrBayes/
GNU General Public License v3.0
234 stars 79 forks source link

append=yes causes all files besides checkpoint and nex files to empty when running openmpi #274

Closed ChrLambert closed 2 years ago

ChrLambert commented 2 years ago

What is the current observed behaviour?

adding append=yes in the mcmc line (and also others) of the nexus file causes the run to abort when run with openmpi versions 4.1.2 and 4.1.4 if -np >1 is used, causing all files besides nexus input and checkpoint file to go empty and being write protected

  Error appending to previous run
  Error in command "Mcmc"
  The error occurred when reading char. 72196390-72196389 on line 92
     in the file 'concatenation.nex'

Returning execution to command line ...

Error in command "Execute"
Will exit with signal 1 (error) because quitonerror is set to yes
If you want control to be returned to the command line on error,
use 'mb -i <filename>' (i is for interactive) or use 'set quitonerror=no'

--------------------------------------------------------------------------

Primary job  terminated normally, but 1 process returned

a non-zero exit code. Per user-direction, the job has been aborted.

--------------------------------------------------------------------------

Error appending to previous run
Error in command "Mcmc"
  The error occurred when reading char. 72196390-72196389 on line 92
     in the file 'concatenation.nex'

 Returning execution to command line ...

 Error in command "Execute"
 Will exit with signal 1 (error) because quitonerror is set to yes
 If you want control to be returned to the command line on error,
 use 'mb -i <filename>' (i is for interactive) or use 'set quitonerror=no'

  Error appending to previous run
  Error in command "Mcmc"
  The error occurred when reading char. 72196390-72196389 on line 92
     in the file 'concatenation.nex'

  Returning execution to command line ...

 Error in command "Execute"
 Will exit with signal 1 (error) because quitonerror is set to yes
 If you want control to be returned to the command line on error,
 use 'mb -i <filename>' (i is for interactive) or use 'set quitonerror=no'

 --------------------------------------------------------------------------
 mpirun detected that one or more processes exited with non-zero status, thus causing
 the job to be terminated. The first process to do so was:

 Process name: [[22136,1],1]
 Exit code:    1

What is the expected/wanted behaviour?

run should take the old run and append on it in a parallelized fashion. Instead, it is only possible to continue with one single core.

How may we reproduce this bug?

Steps to reproduce the bug:

  1. Run any dataset
  2. Abort run by e.g. CTRL+C
  3. Try to open the same file with options for append=yes
  4. Program aborts

Would you be able to compile and run MrBayes to test fixes to this bug?

What is the environment that you run MrBayes in?

Other information that may be of use to us in resolving this issue

Tested two different mpi versions (4.1.2 and 4.1.4), tried to call only 1 processor core via mpirun (worked, but obviously not parallelized), tried to mpirun -np 4 mb to execute the file from interactive mode (worked, but wasn't parallelized), tried to execute file from mb alone (worked, but not parallelized). Tested user rights issue by assigning full rights to the files containing folder by chmod; didn't work either.

ChrLambert commented 2 years ago

Solved the problem by re-installing using brew, which picked up beagle and the correct mpi version. Running the command mb without sudo then works as intended, so the issue can be closed.