bayesiancook / pbmpi

phylobayes mpi
GNU General Public License v2.0
23 stars 9 forks source link

corrupted size vs. prev_size error when using -mutsel #23

Open berkalpay opened 3 years ago

berkalpay commented 3 years ago

Following the command mpirun -n 15 pb_mpi -d ../aligned_RNA_seqs_postprocessed.phylip -cat -gtr -mutsel run02, I get the following error:

--------------------------------------------------------------------------
WARNING: There was an error initializing an OpenFabrics device.

  Local host:   compute-a-16-46
  Local device: mlx4_0
--------------------------------------------------------------------------

model:
stick-breaking Dirichlet process mixture (cat)

read data from file : ../aligned_RNA_seqs_postprocessed.phylip
number of taxa  : 1139
number of sites : 711
number of states: 4

chain name : run02
run started

[compute-a-16-46.o2.rc.hms.harvard.edu:11425] 14 more processes have sent help message help-mpi-btl-openib.txt / error in device init
[compute-a-16-46.o2.rc.hms.harvard.edu:11425] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
*** Error in `pb_mpi': corrupted size vs. prev_size: 0x0000000004e03120 ***
======= Backtrace: =========
/lib64/libc.so.6(+0x7f7c4)[0x7fca5756c7c4]
/lib64/libc.so.6(+0x82fd4)[0x7fca5756ffd4]
/lib64/libc.so.6(__libc_malloc+0x4c)[0x7fca57572adc]
/n/app/gcc/6.2.0/lib64/libstdc++.so.6(_Znwm+0x18)[0x7fca5807ecd8]
pb_mpi[0x4de838]
pb_mpi[0x49c180]
pb_mpi[0x4e66a5]
pb_mpi[0x488cb9]
pb_mpi[0x404d9b]
/lib64/libc.so.6(__libc_start_main+0xf5)[0x7fca5750f505]
pb_mpi[0x421927]

followed by a memory map.

The error occurs before the first MCMC iteration but after the 0th iteration has been written to the .trace file. Strangely, the error occurs very frequently but not always when running the command. It also occurs with a variety of settings of -n.