3dem / relion

Image-processing software for cryo-electron microscopy
https://relion.readthedocs.io/en/latest/
GNU General Public License v2.0
453 stars 202 forks source link

Error message: -- mpirun noticed that process rank 1 with pid exited on signal 11 (segmentation fault). #858

Closed Song-wenfei closed 2 years ago

Song-wenfei commented 2 years ago

Dear Relion users,

I am using Relion3.1.1 to do sub-tomogram averaging. I use 5 mpis, 4 threads, 4 gpus, with additional: - - free_gpu_memory 1000. However, it crashes for each iteration, I have to do endless continue... . I have tried with parallell disc I/O yes/ no, with yes, the run is faster, but crashes for each run; with no, it takes longer, after 6 runs, it starts to crash for each run.

Do you have any suggestions for this? And here is the detailed error message.

Thanks in advance!

[visu002:30037] Process received signal [visu002:30037] Signal: Segmentation fault (11) [visu002:30037] Signal code: Invalid permissions (2) [visu002:30037] Failing at address: 0x2aaac24e8000 [visu002:30037] [ 0] /usr/lib64/libpthread.so.0(+0xf5e0)[0x2aaab98425e0] [visu002:30037] [ 1] /cm/shared/apps/relion/3.1.1/bin/relion_refine_mpi(_ZN11MlOptimiser30calculateExpectedAngularErrorsEll+0x1333)[0x5fa103] [visu002:30037] [ 2] /cm/shared/apps/relion/3.1.1/bin/relion_refine_mpi(_ZN14MlOptimiserMpi11expectationEv+0x2764)[0x470f54] [visu002:30037] [ 3] /cm/shared/apps/relion/3.1.1/bin/relion_refine_mpi(_ZN14MlOptimiserMpi7iterateEv+0xc1)[0x47e5f1] [visu002:30037] [ 4] /cm/shared/apps/relion/3.1.1/bin/relion_refine_mpi(main+0x5f)[0x43ab3f] [visu002:30037] [ 5] /usr/lib64/libc.so.6(__libc_start_main+0xf5)[0x2aaab9a70c05] [visu002:30037] [ 6] /cm/shared/apps/relion/3.1.1/bin/relion_refine_mpi[0x43e84f] [visu002:30037] End of error message [visu002:30260] Process received signal [visu002:30260] Signal: Segmentation fault (11) [visu002:30260] Signal code: Invalid permissions (2) [visu002:30260] Failing at address: 0x2aaabc9ff000 [visu002:30260] [ 0] /usr/lib64/libpthread.so.0(+0xf5e0)[0x2aaab98425e0] [visu002:30260] [ 1] /cm/shared/apps/relion/3.1.1/bin/relion_refine_mpi(_ZN11MlOptimiser30calculateExpectedAngularErrorsEll+0x1333)[0x5fa103] [visu002:30260] [ 2] /cm/shared/apps/relion/3.1.1/bin/relion_refine_mpi(_ZN14MlOptimiserMpi11expectationEv+0x2764)[0x470f54] [visu002:30260] [ 3] /cm/shared/apps/relion/3.1.1/bin/relion_refine_mpi(_ZN14MlOptimiserMpi7iterateEv+0xc1)[0x47e5f1] [visu002:30260] [ 4] /cm/shared/apps/relion/3.1.1/bin/relion_refine_mpi(main+0x5f)[0x43ab3f] [visu002:30260] [ 5] /usr/lib64/libc.so.6(__libc_start_main+0xf5)[0x2aaab9a70c05] [visu002:30260] [ 6] /cm/shared/apps/relion/3.1.1/bin/relion_refine_mpi[0x43e84f] [visu002:30260] End of error message

Best regards, Wenfei

scheres commented 2 years ago

You're probably running out of memory, either on the GPU or the CPU. Please use ccp-em mailing list for errors and this issue list for bug reports.