3dem / relion

Image-processing software for cryo-electron microscopy
https://relion.readthedocs.io/en/latest/
GNU General Public License v2.0
444 stars 199 forks source link

MlOptimiserMpi::initialiseWorkLoad: at least 3 MPI processes are required when splitting data into random halves #777

Closed yongshuo-Z closed 3 years ago

yongshuo-Z commented 3 years ago

Describe your problem

When working with 3D auto-refine, error occurs as the title says. The 3d initial model and 3d classification finished successfully.

I set the MPI process to 3 in GUI, but as can be seen in the error log, the MPI process is still 1. I wonder how can I set it to 3 to make this step work?

I've seen this #470 , but the solution doesn't seem to work for me. For more information, I run the command:

which mpicc -->/usr/bin/mpicc

which mpirun --> /usr/bin/mpirun

mpicc --version -->gcc (Ubuntu 8.4.0-1ubuntu1~18.04) 8.4.0

mpirun --version -->mpirun (Open MPI) 2.1.1

Environment:

Dataset:

Job options:

Error message:

Please cite the full error message as the example below.

 === RELION MPI setup ===
 + Number of MPI processes             = 1
 + Number of threads per MPI process  = 3
 + Total number of threads therefore  = 3
 + Master  (0) runs on host            = xxx
 =================
The following warnings were encountered upon command-line parsing: 
WARNING: Option --allow-run-as-root is not a valid RELION argument
 Running CPU instructions in double precision. 
ERROR: 
MlOptimiserMpi::initialiseWorkLoad: at least 3 MPI processes are required when splitting data into random halves
File: /root/cryo/code/relion/relion-2.1.b1/src/ml_optimiser_mpi.cpp line: 561
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD
with errorcode 1.

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------
WeiZhang-12 commented 3 years ago

Try this. mpirun -n x which relion_refine_mpi --o Refine3D/job019/run --auto_refine **

KrisJanssen commented 1 year ago

@WeiZhang-12 : can you clarify your suggestion? Users in our organization are hitting the same error as original author here. They use the GUI to create jobs and run them locally. However, it seems relion is not explicitly prepending mpirun -n to the generated command.

biochem-fan commented 1 year ago

Please show us your job submission template.

WeiZhang-12 commented 1 year ago

" Re3Working=/home///CryoEM_data/* cd ${Re3Working} mkdir ${Re3Working}/Class2D/job006 mpirun -n 5 relion_refine_mpi --o Class2D/job006/run --i Extract/job005/particles.star --dont_combine_weights_via_disc --pool 3 --pad 2 --ctf --iter 25 --tau2_fudge 2 --particle_diameter 180 --K 15 --flatten_solvent --zero_mask --oversampling 1 --psi_step 12 --offset_range 5 --offset_step 2 --norm --scale --j 5 --gpu "" --pipeline_control Class2D/job006/

" This is one template for class2D using the PBS script. And the following is the run log.

" process will start at : Wed May 18 15:36:43 CST 2022 ++++++++++++++++++++++++++++++++++++++++ RELION version: 3.1.3-commit-fa923d Precision: BASE=double

=== RELION MPI setup ===

KrisJanssen commented 1 year ago

Thanks @WeiZhang-12 and @biochem-fan : but I would like to know why relion GUI seems to generate a command line like:

``which relion_refine_mpi`` --o Refine3D/job019/run blablabla whereas it seems to be suggested instead to run things like **mpirun -n 5** relion_refine_mpi --o Class2D/job006/run

I.e. why is it necessary to explicitly prepend mpirun -n xx?

biochem-fan commented 1 year ago

@KrisJanssen

First of all, are you using Submit to queue?: Yes? In this case, as I wrote before,

Please show us your job submission template.

Otherwise we cannot investigate. Study our documentation carefully.

If you are using "Submit to queue?: No", do you run a job from the Run! button? In this case, mpirun is appended automatically. The "note.txt" in a job folder and the Check command line button do not say mpirun, but it is actually used.

If you are not using a queue system and not using the Run! button, but copy-and-pasting the line from the Check command line button, then this is not a recommended way. In this case you have to manually prepend mpirun.