I am using Relion3.1.1 to do sub-tomogram averaging. I use 5 mpis, 4 threads, 4 gpus, with additional: - - free_gpu_memory 1000. However, it crashes for each iteration, I have to do endless continue... . I have tried with parallell disc I/O yes/ no, with yes, the run is faster, but crashes for each run; with no, it takes longer, after 6 runs, it starts to crash for each run.
Do you have any suggestions for this? And here is the detailed error message.
Thanks in advance!
[visu002:30037] Process received signal
[visu002:30037] Signal: Segmentation fault (11)
[visu002:30037] Signal code: Invalid permissions (2)
[visu002:30037] Failing at address: 0x2aaac24e8000
[visu002:30037] [ 0] /usr/lib64/libpthread.so.0(+0xf5e0)[0x2aaab98425e0]
[visu002:30037] [ 1] /cm/shared/apps/relion/3.1.1/bin/relion_refine_mpi(_ZN11MlOptimiser30calculateExpectedAngularErrorsEll+0x1333)[0x5fa103]
[visu002:30037] [ 2] /cm/shared/apps/relion/3.1.1/bin/relion_refine_mpi(_ZN14MlOptimiserMpi11expectationEv+0x2764)[0x470f54]
[visu002:30037] [ 3] /cm/shared/apps/relion/3.1.1/bin/relion_refine_mpi(_ZN14MlOptimiserMpi7iterateEv+0xc1)[0x47e5f1]
[visu002:30037] [ 4] /cm/shared/apps/relion/3.1.1/bin/relion_refine_mpi(main+0x5f)[0x43ab3f]
[visu002:30037] [ 5] /usr/lib64/libc.so.6(__libc_start_main+0xf5)[0x2aaab9a70c05]
[visu002:30037] [ 6] /cm/shared/apps/relion/3.1.1/bin/relion_refine_mpi[0x43e84f]
[visu002:30037] End of error message
[visu002:30260] Process received signal
[visu002:30260] Signal: Segmentation fault (11)
[visu002:30260] Signal code: Invalid permissions (2)
[visu002:30260] Failing at address: 0x2aaabc9ff000
[visu002:30260] [ 0] /usr/lib64/libpthread.so.0(+0xf5e0)[0x2aaab98425e0]
[visu002:30260] [ 1] /cm/shared/apps/relion/3.1.1/bin/relion_refine_mpi(_ZN11MlOptimiser30calculateExpectedAngularErrorsEll+0x1333)[0x5fa103]
[visu002:30260] [ 2] /cm/shared/apps/relion/3.1.1/bin/relion_refine_mpi(_ZN14MlOptimiserMpi11expectationEv+0x2764)[0x470f54]
[visu002:30260] [ 3] /cm/shared/apps/relion/3.1.1/bin/relion_refine_mpi(_ZN14MlOptimiserMpi7iterateEv+0xc1)[0x47e5f1]
[visu002:30260] [ 4] /cm/shared/apps/relion/3.1.1/bin/relion_refine_mpi(main+0x5f)[0x43ab3f]
[visu002:30260] [ 5] /usr/lib64/libc.so.6(__libc_start_main+0xf5)[0x2aaab9a70c05]
[visu002:30260] [ 6] /cm/shared/apps/relion/3.1.1/bin/relion_refine_mpi[0x43e84f]
[visu002:30260] End of error message
Dear Relion users,
I am using Relion3.1.1 to do sub-tomogram averaging. I use 5 mpis, 4 threads, 4 gpus, with additional: - - free_gpu_memory 1000. However, it crashes for each iteration, I have to do endless continue... . I have tried with parallell disc I/O yes/ no, with yes, the run is faster, but crashes for each run; with no, it takes longer, after 6 runs, it starts to crash for each run.
Do you have any suggestions for this? And here is the detailed error message.
Thanks in advance!
[visu002:30037] Process received signal [visu002:30037] Signal: Segmentation fault (11) [visu002:30037] Signal code: Invalid permissions (2) [visu002:30037] Failing at address: 0x2aaac24e8000 [visu002:30037] [ 0] /usr/lib64/libpthread.so.0(+0xf5e0)[0x2aaab98425e0] [visu002:30037] [ 1] /cm/shared/apps/relion/3.1.1/bin/relion_refine_mpi(_ZN11MlOptimiser30calculateExpectedAngularErrorsEll+0x1333)[0x5fa103] [visu002:30037] [ 2] /cm/shared/apps/relion/3.1.1/bin/relion_refine_mpi(_ZN14MlOptimiserMpi11expectationEv+0x2764)[0x470f54] [visu002:30037] [ 3] /cm/shared/apps/relion/3.1.1/bin/relion_refine_mpi(_ZN14MlOptimiserMpi7iterateEv+0xc1)[0x47e5f1] [visu002:30037] [ 4] /cm/shared/apps/relion/3.1.1/bin/relion_refine_mpi(main+0x5f)[0x43ab3f] [visu002:30037] [ 5] /usr/lib64/libc.so.6(__libc_start_main+0xf5)[0x2aaab9a70c05] [visu002:30037] [ 6] /cm/shared/apps/relion/3.1.1/bin/relion_refine_mpi[0x43e84f] [visu002:30037] End of error message [visu002:30260] Process received signal [visu002:30260] Signal: Segmentation fault (11) [visu002:30260] Signal code: Invalid permissions (2) [visu002:30260] Failing at address: 0x2aaabc9ff000 [visu002:30260] [ 0] /usr/lib64/libpthread.so.0(+0xf5e0)[0x2aaab98425e0] [visu002:30260] [ 1] /cm/shared/apps/relion/3.1.1/bin/relion_refine_mpi(_ZN11MlOptimiser30calculateExpectedAngularErrorsEll+0x1333)[0x5fa103] [visu002:30260] [ 2] /cm/shared/apps/relion/3.1.1/bin/relion_refine_mpi(_ZN14MlOptimiserMpi11expectationEv+0x2764)[0x470f54] [visu002:30260] [ 3] /cm/shared/apps/relion/3.1.1/bin/relion_refine_mpi(_ZN14MlOptimiserMpi7iterateEv+0xc1)[0x47e5f1] [visu002:30260] [ 4] /cm/shared/apps/relion/3.1.1/bin/relion_refine_mpi(main+0x5f)[0x43ab3f] [visu002:30260] [ 5] /usr/lib64/libc.so.6(__libc_start_main+0xf5)[0x2aaab9a70c05] [visu002:30260] [ 6] /cm/shared/apps/relion/3.1.1/bin/relion_refine_mpi[0x43e84f] [visu002:30260] End of error message
Best regards, Wenfei