Open hans-vg opened 1 week ago
What you provided is the STDERR that results fro from the MPI manager diving, the causal error will be further back in the output.
—Carson
On Oct 29, 2024, at 10:57 AM, Hans @.***> wrote:
Recently, I updated from v2 to v3 maker for a new annotation project. I compiled maker v3 using the same MPICH module I used previously for maker v2.
module load mpich/ge/gcc/64/3.3.2
However, now when I run maker in MPI mode, crashes after 3-20 hours of processing.
Any suggestions on how to troubleshoot or get MPI to run would be greatly appreciated.
Thank you, -Hans
Below are some example errors:
FATAL: Thread terminated, causing all processes to fail --> rank=69, hostname=cpu-54 @. HYD_pmcd_pmip_control_cmd_cb (pm/pmiserv/pmip_cb.c:878): assert (!closed) failed @. HYDT_dmxu_poll_wait_for_event (tools/demux/demux_poll.c:77): callback returned error status @. main (pm/pmiserv/pmip.c:200): demux engine error waiting for event @. HYD_pmcd_pmip_control_cmd_cb (pm/pmiserv/pmip_cb.c:878): assert (!closed) failed @. HYDT_dmxu_poll_wait_for_event (tools/demux/demux_poll.c:77): callback returned error status @. main (pm/pmiserv/pmip.c:200): demux engine error waiting for event srun: error: cpu-53: task 0: Exited with exit code 7 srun: error: cpu-55: task 2: Exited with exit code 7 @. HYDT_bscu_wait_for_completion (tools/bootstrap/utils/bscu_wait.c:75): one of the processes terminated badly; aborting @. HYDT_bsci_wait_for_completion (tools/bootstrap/src/bsci_wait.c:22): launcher returned error waiting for completion @. HYD_pmci_wait_for_completion (pm/pmiserv/pmiserv_pmci.c:215): launcher returned error waiting for completion @. main (ui/mpich/mpiexec.c:336): process manager error waiting for completion FATAL: Thread terminated, causing all processes to fail --> rank=94, hostname=cpu-54 @. HYD_pmcd_pmip_control_cmd_cb (pm/pmiserv/pmip_cb.c:878): assert (!closed) failed @. HYDT_dmxu_poll_wait_for_event (tools/demux/demux_poll.c:77): callback returned error status @. main (pm/pmiserv/pmip.c:200): demux engine error waiting for event @. HYD_pmcd_pmip_control_cmd_cb (pm/pmiserv/pmip_cb.c:878): assert (!closed) failed @. HYDT_dmxu_poll_wait_for_event (tools/demux/demux_poll.c:77): callback returned error status @. main (pm/pmiserv/pmip.c:200): demux engine error waiting for event srun: error: cpu-55: task 2: Exited with exit code 7 srun: error: cpu-53: task 0: Exited with exit code 7 @. HYDT_bscu_wait_for_completion (tools/bootstrap/utils/bscu_wait.c:75): one of the processes terminated badly; aborting @. HYDT_bsci_wait_for_completion (tools/bootstrap/src/bsci_wait.c:22): launcher returned error waiting for completion @. HYD_pmci_wait_for_completion (pm/pmiserv/pmiserv_pmci.c:215): launcher returned error waiting for completion @. main (ui/mpich/mpiexec.c:336): process manager error waiting for completion deleted:1 hits Calling FastaDB::new at /data/gpfs/assoc/inbre/projects/software_installs/maker-Version_3.01.04/bin/../lib/FastaSeq.pm line 139. Calling out to BioPerl get_PrimarySeq_stream at /data/gpfs/assoc/inbre/projects/software_installs/maker-Version_3.01.04/bin/../lib/GI.pm line 2287. collecting tblastx reports flattening altEST clusters Fatal error in PMPI_Send: Unknown error class, error stack: PMPI_Send(159).............: MPI_Send(buf=0x555559942d30, count=4, MPI_CHAR, dest=71, tag=1111, MPI_COMM_WORLD) failed MPID_nem_tcp_connpoll(1845): Communication error with rank 71: Connection refused — Reply to this email directly, view it on GitHub https://github.com/Yandell-Lab/maker/issues/22, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABEFX767TSGKFLMYAVPXYGDZ5647XAVCNFSM6AAAAABQ2KDGS6VHI2DSMVQWIX3LMV43ASLTON2WKOZSGYZDCOBUGA3TEMY. You are receiving this because you are subscribed to this thread.
Recently, I updated from v2 to v3 maker for a new annotation project. I compiled maker v3 using the same MPICH module I used previously for maker v2.
module load mpich/ge/gcc/64/3.3.2
However, now when I run maker in MPI mode, crashes after 3-20 hours of processing.
Any suggestions on how to troubleshoot or get MPI to run would be greatly appreciated.
Thank you, -Hans
Below are some example errors: