Closed carl9384 closed 3 years ago
The latest relion_refine_mpi runs 10x-20x slower when given a stack as input, rather than a star file. These tests are with cuda8.0, and a Tesla K80:
[ec2-user@ip-172-31-8-78 ~]$ mpirun -np 2 /home/ec2-user/relion/build/bin/relion_refine_mpi --i preprocess_test.mrcs --o testdir --angpix 0.85 --K 20 --gpu === RELION MPI setup === + Number of MPI processes = 2 + Master (0) runs on host = ip-172-31-8-78 + Slave 1 runs on host = ip-172-31-8-78 ================= uniqueHost ip-172-31-8-78 has 1 ranks. GPU-ids not specified for this rank, threads will automatically be mapped to available devices. Thread 0 on slave 1 mapped to device 0 Running CPU instructions in double precision. + WARNING: Changing psi sampling rate (before oversampling) to 5.625 degrees, for more efficient GPU calculations Estimating initial noise spectra 21/ 21 sec ............................................................~~(,_,"> Estimating accuracies in the orientational assignment ... 0/ 0 sec ............................................................~~(,_,"> Auto-refine: Estimated accuracy angles= 30.1 degrees; offsets= 10.1 pixels CurrentResolution= 11.6571 Angstroms, which requires orientationSampling of at least 17.1429 degrees for a particle of diameter 77.35 Angstroms Oversampling= 0 NrHiddenVariableSamplingPoints= 37120 OrientationalSampling= 5.625 NrOrientations= 64 TranslationalSampling= 2 NrTranslations= 29 ============================= Oversampling= 1 NrHiddenVariableSamplingPoints= 1187840 OrientationalSampling= 2.8125 NrOrientations= 512 TranslationalSampling= 1 NrTranslations= 116 ============================= Expectation iteration 1 of 50 0.02/1.02 hrs .~~(,_,">
With a starfile:
[ec2-user@ip-172-31-8-78 ~]$ mpirun -np 2 /home/ec2-user/relion/build/bin/relion_refine_mpi --i qstack10-complete_relion_stack.star --o testdir --angpix 0.85 --dont_check_norm --K 20 --gpu === RELION MPI setup === + Number of MPI processes = 2 + Master (0) runs on host = ip-172-31-8-78 + Slave 1 runs on host = ip-172-31-8-78 ================= uniqueHost ip-172-31-8-78 has 1 ranks. GPU-ids not specified for this rank, threads will automatically be mapped to available devices. Thread 0 on slave 1 mapped to device 0 Running CPU instructions in double precision. + WARNING: Changing psi sampling rate (before oversampling) to 5.625 degrees, for more efficient GPU calculations Estimating initial noise spectra 21/ 21 sec ............................................................~~(,_,"> Estimating accuracies in the orientational assignment ... 0/ 0 sec ............................................................~~(,_,"> Auto-refine: Estimated accuracy angles= 0.4 degrees; offsets= 0.3 pixels CurrentResolution= 11.6571 Angstroms, which requires orientationSampling of at least 17.1429 degrees for a particle of diameter 77.35 Angstroms Oversampling= 0 NrHiddenVariableSamplingPoints= 37120 OrientationalSampling= 5.625 NrOrientations= 64 TranslationalSampling= 2 NrTranslations= 29 ============================= Oversampling= 1 NrHiddenVariableSamplingPoints= 1187840 OrientationalSampling= 2.8125 NrOrientations= 512 TranslationalSampling= 1 NrTranslations= 116 ============================= Expectation iteration 1 of 50 0.10/5.98 min .~~(,_,">^C[ec2-user@ip-172-31-8-78 ~]$ [oo]
And with relion v2.0:
[ec2-user@ip-172-31-8-78 ~]$ mpirun -np 2 /home/ec2-user/relion-2-cuda8/build/bin/relion_refine_mpi --i preprocess_test.mrcs --o testdir --angpix 0.85 --K 20 --gpu === RELION MPI setup === + Number of MPI processes = 2 + Master (0) runs on host = ip-172-31-8-78 + Slave 1 runs on host = ip-172-31-8-78 ================= Running CPU instructions in double precision. + WARNING: Changing psi sampling rate (before oversampling) to 5.625 degrees, for more efficient GPU calculations Estimating initial noise spectra 13/ 13 sec ............................................................~~(,_,"> uniqueHost ip-172-31-8-78 has 1 ranks. GPU-ids not specified for this rank, threads will automatically be mapped to available devices. Thread 0 on slave 1 mapped to device 0 Estimating accuracies in the orientational assignment ... 1/ 1 sec ............................................................~~(,_,"> Auto-refine: Estimated accuracy angles= 30.1 degrees; offsets= 10.1 pixels CurrentResolution= 11.6571 Angstroms, which requires orientationSampling of at least 17.1429 degrees for a particle of diameter 77.35 Angstroms Oversampling= 0 NrHiddenVariableSamplingPoints= 37120 OrientationalSampling= 5.625 NrOrientations= 64 TranslationalSampling= 2 NrTranslations= 29 ============================= Oversampling= 1 NrHiddenVariableSamplingPoints= 1187840 OrientationalSampling= 2.8125 NrOrientations= 512 TranslationalSampling= 1 NrTranslations= 116 ============================= Expectation iteration 1 of 50 0.50/6.65 min ....~~(,_,">^C[ec2-user@ip-172-31-8-78 ~]$ [oo]
With relion v2.0 and starfile:
[ec2-user@ip-172-31-8-78 ~]$ mpirun -np 2 /home/ec2-user/relion-2-cuda8/build/bin/relion_refine_mpi --i qstack10-complete_relion_stack.star --o testdir --angpix 0.85 --K 20 --gpu --dont_check_norm === RELION MPI setup === + Number of MPI processes = 2 + Master (0) runs on host = ip-172-31-8-78 + Slave 1 runs on host = ip-172-31-8-78 ================= Running CPU instructions in double precision. + WARNING: Changing psi sampling rate (before oversampling) to 5.625 degrees, for more efficient GPU calculations Estimating initial noise spectra 11/ 11 sec ............................................................~~(,_,"> uniqueHost ip-172-31-8-78 has 1 ranks. GPU-ids not specified for this rank, threads will automatically be mapped to available devices. Thread 0 on slave 1 mapped to device 0 Estimating accuracies in the orientational assignment ... 1/ 1 sec ............................................................~~(,_,"> Auto-refine: Estimated accuracy angles= 0.3 degrees; offsets= 0.15 pixels CurrentResolution= 11.6571 Angstroms, which requires orientationSampling of at least 17.1429 degrees for a particle of diameter 77.35 Angstroms Oversampling= 0 NrHiddenVariableSamplingPoints= 37120 OrientationalSampling= 5.625 NrOrientations= 64 TranslationalSampling= 2 NrTranslations= 29 ============================= Oversampling= 1 NrHiddenVariableSamplingPoints= 1187840 OrientationalSampling= 2.8125 NrOrientations= 512 TranslationalSampling= 1 NrTranslations= 116 ============================= Expectation iteration 1 of 50 0.52/4.12 min .......~~(,_,">[ec2-user@ip-172-31-8-78 ~]$ [oo]
Too old issue.
The latest relion_refine_mpi runs 10x-20x slower when given a stack as input, rather than a star file. These tests are with cuda8.0, and a Tesla K80:
With a starfile:
And with relion v2.0:
With relion v2.0 and starfile: