3dem / relion

Image-processing software for cryo-electron microscopy
https://relion.readthedocs.io/en/latest/
GNU General Public License v2.0
445 stars 200 forks source link

Class3D with local searches crashes #596

Closed Piotrzrek closed 3 years ago

Piotrzrek commented 4 years ago

Dear Relion developers,

I am using Relion 3.1 beta commit a0af57. When running 3D classification with global searches it runs without issues, however, when I switch to local searches to separate different conformations the run crashes, i.e. the GPU workstation reboots without any error. If then I continue from previous iteration, or 2 iterations early, the run continues, or the crash happens again at different iteration. Sometimes, after the crash I find out that the optimiser.star file of the previous iteration is empty (and relion still starts the new iteration). Using the 3.0.8 it never happened to me. My box size is 160 px, I load the stack on the SSD. Our workstation is equipped with 128 GB ram and 4x GTX 1080 Ti.

The exemplary command I use: which relion_refine_mpi --o Class3D/job150/run --i Subtract/job120/particles_subtracted.star --ref MultiBody/job118/run_body002_mem_MB_J118.mrc --ini_high 40 --dont_combine_weights_via_disc --pool 30 --pad 2 --skip_gridding --ctf --ctf_corrected_ref --iter 25 --tau2_fudge 6 --particle_diameter 240 --fast_subsets --K 12 --flatten_solvent --zero_mask --strict_highres_exp 5 --solvent_mask MaskCreate/job067/mask_MB_J070.mrc --oversampling 1 --healpix_order 4 --sigma_ang 3.33333 --offset_range 5 --offset_step 2 --sym C1 --norm --scale --j 3 --gpu "0:1:2:3" --pipeline_control Class3D/job150/

I look forward to receiving any advice from you. Kind regards, Piotr

biochem-fan commented 4 years ago

the GPU workstation reboots without any error

This sounds like a hardware problem. For example, GPUs get too hot or consume too much energy.

Piotrzrek commented 4 years ago

Thank you for such a prompt reply. We will look into it.

donghuachensu commented 2 years ago

Hi, I guess I am having a similar problem with Relion-3.1.2-commit-44d576. I am trying Class3D on the same dataset with only different Angular sampling interval: 7.5 degrees, 3.7 degrees or 1.8 degrees. The run with Angular sampling interval 7.5 degrees had no problem at all, however, the run with Angular sampling interval 3.7 or 1.8 degrees failed in the beginning with the error of Allocated GPU memory not enough. Any suggestions? Thanks.

biochem-fan commented 2 years ago

If your particles are not good, the posterior probability distribution of angles is wide. With a finer sampling rate, the program has to consider more candidates, thus requiring more memory. You might want to use --maxsig 3000 to limit the number of possibilities to consider.

donghuachensu commented 2 years ago

Thanks for your suggestion. The option --maxsig 3000 made the job running fine on my GPU. Another question: will the Class3D job with a smaller Angular sampling interval (e.g., 1.8 degrees) be faster than the one with a larger Angular sampling interval (e.g., 7.5 degrees)?

biochem-fan commented 2 years ago

No, slower.