3dem / relion

Image-processing software for cryo-electron microscopy
https://relion.readthedocs.io/en/latest/
GNU General Public License v2.0
444 stars 197 forks source link

template for autopick job submission to cluster #157

Closed bforsbe closed 3 years ago

bforsbe commented 7 years ago

Originally reported by: Andreas Schenk (Bitbucket: andschenk, GitHub: andschenk)


Hi,

we ran into an issue with submitting autopick jobs to a cluster environment. The problem appears because the autopick job consists of two separate commands. When submitting a job to a cluster environment using a submission script template, Relion2 currently copies the line containing XXXcommandXXX and uses it for all the commands. E.g. using the template:

#!csh

mpirun  -n XXXmpinodesXXX XXXcommandXX

leads to the submission script:

#!csh

mpirun  -n 256 `which relion_autopick_mpi` --i CtfFind/job005/micrographs_ctf.star --ref Select/template4autopick/class_averages.star --odir AutoPick/job047/ --pickname autopick --invert  --ctf  --ang 5 --shrink 0 --lowpass 20 --particle_diameter 200 --threshold 0.3 --min_distance 200 --max_stddev_noise 1.1 

mpirun -n 256  echo CtfFind/job005/micrographs_ctf.star > AutoPick/job047/coords_suffix_autopick.star

This is insofar problematic that now multiple processes try to write to the same file at the same time, which might or might not work, depending on the file system involved.

A second problem arises on our cluster when using the GPU version of autopick. The cluster scheduler doesn't necessarily assign all GPUs on a node to Relion. Therefore the GPU resources have to be explicitly set within the submission script. Currently I use the template:

#!csh

RELION_COMMAND="XXXcommandXXX" 
#  set explicit resources for --gpu parameter
RELION_COMMAND_GPU=`echo $RELION_COMMAND|sed "s/--gpu[^-]*/--gpu  $GPU_RESOURCES /"`
mpirun -n XXXmpinodesXXX  $RELION_COMMAND_GPU

which leads to the submission script:

#!csh

RELION_COMMAND="`which relion_autopick_mpi` --i CtfFind/job005/micrographs_ctf.star --ref Select/template4autopick/class_averages.star --odir AutoPick/job049/ --pickname autopick --invert  --ctf  --ang 5 --shrink 0 --lowpass 20 --particle_diameter 200 --threshold 0.3 --min_distance 200 --max_stddev_noise 1.1 --gpu  " 
RELION_COMMAND_GPU=`echo $RELION_COMMAND|sed "s/--gpu[^-]*/--gpu  $GPU_RESOURCES /"`
mpirun -n 256  $RELION_COMMAND_GPU
RELION_COMMAND="echo CtfFind/job005/micrographs_ctf.star > AutoPick/job049/coords_suffix_autopick.star" 

This obviously doesn't work for autopick as the second command is never executed.

Is there a way to distinguish the first autopick command (which should be run with mpirun and have GPU resources) from the second command (which only needs to be run on the master and doesn't need GPU resources) within the submission template? This would make it much easier to write a generic and portable template script.


bforsbe commented 7 years ago

Original comment by Sjors Scheres (Bitbucket: scheres, GitHub: scheres):


Thanks for that! I've just commited a change that will not put the mpirun before any command that is not a relion _mpi call. This will soon be incorporated into beta