Closed chancie closed 5 years ago
Using 5 MPIs might cause high use of RAM, that combined with syncing between the MPI-ranks might cause some serious delay at some point. Try using 4 GPUs without MPI. You can just run relion_refine with --j 8, all 4 GPUs will still be used, in your case each one using 2 threads.
No response for a long time. Closing.
I am aware that a similar issues have been reported in the past but without any resolution. Hence, submitting it as a new issue. I am currently struggling to get my 2D classification jobs running. The input has ~ 2 million particles.
I am running my jobs with Relion-2.1-beta-1. The node has 32 CPUs with 4 1080s. The command that I have submitted is as follows:
which relion_refine_mpi --o Class2D/job051/run --i ./Extract/job037/particles.star --dont_combine_weights_via_disc --no_parallel_disc_io --pool 50 --c tf --ctf_intact_first_peak --iter 25 --write_subsets 1 --subset_size 25000 --max_subsets 5 --tau2_fudge 2 --particle_diameter 150 --K 100 --flatten_solvent --zero_mask -- strict_highres_exp 12 --oversampling 1 --psi_step 12 --offset_range 5 --offset_step 2 --norm --scale --j 1 --dont_check_norm --maxsig 50
It is simply stuck for now 5 hours at estimating initial noise spectra step (see blow)
=== RELION MPI setup ===
Slave 4 runs on host = pascal01
Running CPU instructions in double precision. Estimating initial noise spectra 4.10/12.93 min .................~~(,,"> [oo] 7.10/12.90 min ...............................~~(,,"> 9.47/12.90 min ...........................................~~(,,"> 11.62/12.90 min ....................................................~~(,,"> 12.90/12.90 min ............................................................~~(,_,">
I have submitted a similar job to a workstation with with 16 CPUs and a 1080, and the job starts immediately with following command
which relion_refine --o Class2D/job052/run --i ./Extract/job037/particles.star --dont_combine_weights_via_disc --no_parallel_disc_io --pool 50 --ctf --ctf_intact_first_pea k --iter 25 --tau2_fudge 2 --particle_diameter 150 --K 100 --flatten_solvent --zero_mask --strict_highres_exp 12 --oversampling 1 --psi_step 12 --offset_range 5 --offset_s tep 2 --norm --scale --j 2 --gpu "" --dont_check_norm --maxsig 20
The gpu-ids not specified, threads will automatically be mapped to devices (incrementally). Thread 0 mapped to device 0 Thread 1 mapped to device 0 Running CPU instructions in double precision.
WARNING: Changing psi sampling rate (before oversampling) to 11.25 degrees, for more efficient GPU calculations Estimating initial noise spectra 8.53/26.93 min .................~~(,,"> [oo] 27.20/27.20 min ............................................................~~(,,"> Estimating accuracies in the orientational assignment ... 0/ 0 sec .~~(,,"> [oo] 5/ 5 sec ............................................................~~(,,"> Auto-refine: Estimated accuracy angles= 30.1 degrees; offsets= 10.1 pixels CurrentResolution= 45.4286 Angstroms, which requires orientationSampling of at least 32.7273 degrees for a particle of diameter 150 Angstroms Oversampling= 0 NrHiddenVariableSamplingPoints= 67200 OrientationalSampling= 11.25 NrOrientations= 32 TranslationalSampling= 2 NrTranslations= 21
Oversampling= 1 NrHiddenVariableSamplingPoints= 2150400 OrientationalSampling= 5.625 NrOrientations= 256 TranslationalSampling= 1 NrTranslations= 84
Expectation iteration 1 of 25 0.97/3.05 hrs ..................~~(,,"> [oo] 1.43/3.05 hrs ............................~~(,,">
Any idea what the issue could be? Thank you in advance for helping out.