3dem / relion

Image-processing software for cryo-electron microscopy
https://relion.readthedocs.io/en/latest/
GNU General Public License v2.0
444 stars 197 forks source link

Relion/5.0 tomgoraphy Reconstruction modality runs in an endless loop #1149

Open schmitbp opened 2 months ago

schmitbp commented 2 months ago

Hello, I've been attempting to use RELION's tomography suite for sub tomogram averaging. However I've consistently run into difficulties using the Reconstruction modality. I've been following Relion/5.0's subtomogram averaging tutorial and have been able to run all prior steps (seemingly successfully). However, when attempting to run the reconstruction step (after alignment using AreTomo), the program runs in an endless loop (for more than 18 hours) and has to be killed in order to free up the CPUs. I've checked and it looks as if the reconstruction worked (i.e., I can navigate to the Relion/ReconstructTomograms folder and can find the half1.mrc/half2.mrc reconstructions output from this part of the pipeline. However, if i check the CPU load, it's as if the program never stopped running and the job never moves to the completed tab. I will add that there is nothing written out in run.err, and the run.out section just shows:

My Input .star file (output from the alignment part of the pipeline) looks like this:

Created by the starfile Python package (version 0.4.12) at 14:54:33 on 19/06/2024

data_global

loop_ _rlnTomoName #1 _rlnVoltage #2 _rlnSphericalAberration #3 _rlnAmplitudeContrast #4 _rlnMicrographOriginalPixelSize #5 _rlnTomoHand #6 _rlnMtfFileName #7 _rlnTomoTiltSeriesPixelSize #8 _rlnTomoTiltSeriesStarFile #9 sample378-tiltseries 300.000000 2.700000 0.100000 3.362000 -1.000000 mtf_k3_standard_300kV_FL2.star 6.724000 AlignTiltSeries/job043/tilt_series/sample378-tiltseries.star

We are running RELION on our lab's server with 64x CPUs and 6x Nvidia A40 GPUs. The version we're running is Relion/5.0-beta-gpu. I have loaded the following Modules: 1) chpc/1.0 (S) 6) miniconda3/relion5 2) gcc/8.5.0 7) relion/5.0-beta-gpu 3) intel-oneapi-mpi/2021.1.1 8) ctffind/4.1.14 4) cuda/12.2.0 (g) 9) aretomo2/1.1.2 5) intel-oneapi-mkl/2022.0.2 If relevant, I did check Generate Tomograms for Denoising. Has anyone else run into this problem? Thank you and happy to provide more information if needed.

rdrighetto commented 2 months ago

Is it consistently happening for you, or is it random? I run into this issue randomly on our cluster, so I assume just a glitch in network communication, but not sure.

schmitbp commented 2 months ago

It's a consistent problem, anytime I start a reconstruction it starts the endless loop