3dem / relion

Image-processing software for cryo-electron microscopy
https://relion.readthedocs.io/en/latest/
GNU General Public License v2.0
453 stars 202 forks source link

Particle Polishing and CTF Refinement Failure #909

Closed mapaulson closed 2 years ago

mapaulson commented 2 years ago

I'm seeing an odd issue with both particle polishing and CTF refinement. The error is most clearly described using particle polishing. If I try and run a job using MPI, I receive a segmentation fault error.

Thinking this might be an MPI-related error I tried the run the job again using a single MPI process. In this case the job does not terminate with an error, but halts indefinitely at the "Performing Loop ..." step. Checking running processes shows not Relion jobs are running.

I see identical behaviour with CTF refinement - i.e. anything that appears to require a loop over the micrographs.

AFAIR, this was working properly in earlier v 4.0 builds, but unfortunately I cannot pinpoint at what version the error started to occur. Nothing else in OS or MPI setup has changed on our system.

Environment:

Dataset:

Job options:

Error message:

Please cite the full error message as the example below.

WARNING: You did not specify --angpix_ref. The pixel size in the image header of Refine3D/job097/run_half1_class001_unfil.mrc, 2.04 A/px, is used.
WARNING: You did not specify --angpix_ref. The pixel size in the image header of Refine3D/job097/run_half1_class001_unfil.mrc, 2.04 A/px, is used.
WARNING: You did not specify --angpix_ref. The pixel size in the image header of Refine3D/job097/run_half1_class001_unfil.mrc, 2.04 A/px, is used.
WARNING: You did not specify --angpix_ref. The pixel size in the image header of Refine3D/job097/run_half1_class001_unfil.mrc, 2.04 A/px, is used.
[napier:50253] *** Process received signal ***
[napier:50253] Signal: Segmentation fault (11)
[napier:50253] Signal code: Address not mapped (1)
[napier:50253] Failing at address: 0xfffffffffffffff8
[napier:50253] [ 0] /lib64/libpthread.so.0(+0xf630)[0x7f937b30b630]
[napier:50253] [ 1] /software/relion-4.0/bin/relion_motion_refine_mpi(_ZN16ObservationModel11getMtfImageEii+0x227)[0x535b47]
[napier:50253] [ 2] /software/relion-4.0/bin/relion_motion_refine_mpi(_ZN16ObservationModel18predictObservationER9ProjectorRK13MetaDataTablelR13MultidimArrayI8tComplexIdEEdbbbbb+0x845)[0x539425]
[napier:50253] [ 3] /software/relion-4.0/bin/relion_motion_refine_mpi[0x570bb1]
[napier:50253] [ 4] /lib64/libgomp.so.1(+0x16405)[0x7f937b744405]
[napier:50253] [ 5] /lib64/libpthread.so.0(+0x7ea5)[0x7f937b303ea5]
[napier:50253] [ 6] /lib64/libc.so.6(clone+0x6d)[0x7f937b02cb0d]
[napier:50253] *** End of error message ***
[napier:50250] *** Process received signal ***
[napier:50250] Signal: Segmentation fault (11)
[napier:50250] Signal code: Address not mapped (1)
[napier:50250] Failing at address: 0xfffffffffffffff8
[napier:50250] [ 0] /lib64/libpthread.so.0(+0xf630)[0x7f3386b02630]
[napier:50250] [ 1] /software/relion-4.0/bin/relion_motion_refine_mpi(_ZN16ObservationModel11getMtfImageEii+0x227)[0x535b47]
[napier:50250] [ 2] /software/relion-4.0/bin/relion_motion_refine_mpi(_ZN16ObservationModel18predictObservationER9ProjectorRK13MetaDataTablelR13MultidimArrayI8tComplexIdEEdbbbbb+0x845)[0x539425]
[napier:50250] [ 3] /software/relion-4.0/bin/relion_motion_refine_mpi[0x570bb1]
[napier:50250] [ 4] /software/relion-4.0/bin/relion_motion_refine_mpi(_ZN12ReferenceMap10predictAllERK13MetaDataTableR16ObservationModelNS_7HalfSetEibbbbb+0x127)[0x571187]
[napier:50250] [ 5] /software/relion-4.0/bin/relion_motion_refine_mpi(_ZN15MotionEstimator14prepMicrographERK13MetaDataTableRSt6vectorI21ParFourierTransformerSaIS4_EERKS3_I5ImageIdESaIS9_EEiRS3_IS3_IS8_I8tComplexIdEESaISG_EESaISI_EERS3_ISB_SaISB_EERS3_IN6gravis8t2VectorIdEESaISR_EERS3_IST_SaIST_EESU_+0x337)[0x494c57]
[napier:50250] [ 6] /software/relion-4.0/bin/relion_motion_refine_mpi(_ZN15MotionEstimator7processERKSt6vectorI13MetaDataTableSaIS1_EEllb+0x730)[0x4968d0]
[napier:50250] [ 7] /software/relion-4.0/bin/relion_motion_refine_mpi(_ZN16MotionRefinerMpi16runWithFccUpdateEv+0x244)[0x4b4f24]
[napier:50250] [ 8] /software/relion-4.0/bin/relion_motion_refine_mpi(main+0x45)[0x434a55]
[napier:50250] [ 9] /lib64/libc.so.6(__libc_start_main+0xf5)[0x7f3386747555]
[napier:50250] [10] /software/relion-4.0/bin/relion_motion_refine_mpi[0x4358af]
[napier:50250] *** End of error message ***
biochem-fan commented 2 years ago

Please make sure your input STAR file is consistent.

Don't some particles refer to a non-existent optics group? Do you have MTF files in place?

mapaulson commented 2 years ago

Ah, found the problem. There was a malformed MTF file (the first "_data") line was missing. I specified the file during the Import step, and assumed that because Post-processing was working, the file was being parsed correctly. However, the Post-processing log shows the file was not actually being read. If I explicitly specified the file in the Post-processing step, it failed (and presumably also polishing etc. ). Fixing the file solved all the problems.