A User of our HPC setup runs into this problem (I reproduced the error):
MultiBody refinement runs up to ~ it 15 and then fails with an MPI_ERR_TRUNCATE .
As described in Issue #669 already, I tried to run with the suggestion " Combine iterations through disc?: Yes in the Compute tab" , but did not help, failed at it 16 and caused the node to crash.
3: MPI_ERR_TRUNCATE: message truncated
3: MPI_ERR_TRUNCATE: message truncated
in: /var/tmp/assman_g/relion-5.0-beta/src/src/mpi.cpp, line 495
ERROR:
Encountered an MPI-related error, see above. Now exiting...
terminate called after throwing an instance of 'RelionError'
relion_refine_mpi:27674 terminated with signal 6 at PC=2b6ae17d1387 SP=7ffde7459eb8. Backtrace:
/usr/lib64/libc.so.6(gsignal+0x37)[0x2b6ae17d1387]
/usr/lib64/libc.so.6(abort+0x148)[0x2b6ae17d2a78]
/opt/psi/Programming/gcc/10.4.0/lib64/libstdc++.so.6(+0x995ec)[0x2b6ae0d0e5ec]
/opt/psi/Programming/gcc/10.4.0/lib64/libstdc++.so.6(+0xa4806)[0x2b6ae0d19806]
/opt/psi/Programming/gcc/10.4.0/lib64/libstdc++.so.6(+0xa4871)[0x2b6ae0d19871]
/opt/psi/Programming/gcc/10.4.0/lib64/libstdc++.so.6(+0xa4b04)[0x2b6ae0d19b04]
/opt/psi/EM/relion/5.0-beta/bin/relion_refine_mpi[0x44db76]
srun: error: merlin-g-009: task 3: Exited with exit code 1
Dear All,
Describe your problem
A User of our HPC setup runs into this problem (I reproduced the error): MultiBody refinement runs up to ~ it 15 and then fails with an MPI_ERR_TRUNCATE .
As described in Issue #669 already, I tried to run with the suggestion " Combine iterations through disc?: Yes in the Compute tab" , but did not help, failed at it 16 and caused the node to crash.
Environment:
Job options:
Full command:
`srun `which relion_refine_mpi` --continue Refine3D/job028/run_it016_optimiser.star --o MultiBody/job049/run --solvent_correct_fsc --multibody_masks multi body.star --blush --oversampling 1 --healpix_order 3 --auto_local_healpix_order 3 --offset_range 3 --offset_step 1.5 --reconstruct_subtracted_bodies -- scratch_dir /scratch --pool 3 --pad 2 --j 4 --gpu "" --pipeline_control MultiBody/job049/ `which relion_flex_analyse` --PCA_orient --model MultiBody/job049/run_model.star --data MultiBody/job049/run_data.star --bodies multibody.star --o Multi Body/job049/analyse --do_maps --k 3 --pipeline_control MultiBody/job049/
Error message: