Originally reported by: Andreas Schenk (Bitbucket: andschenk, GitHub: andschenk)
Hi,
we ran into an issue with submitting autopick jobs to a cluster environment. The problem appears because the autopick job consists of two separate commands. When submitting a job to a cluster environment using a submission script template, Relion2 currently copies the line containing XXXcommandXXX and uses it for all the commands. E.g. using the template:
This is insofar problematic that now multiple processes try to write to the same file at the same time, which might or might not work, depending on the file system involved.
A second problem arises on our cluster when using the GPU version of autopick. The cluster scheduler doesn't necessarily assign all GPUs on a node to Relion. Therefore the GPU resources have to be explicitly set within the submission script. Currently I use the template:
#!csh
RELION_COMMAND="XXXcommandXXX"
# set explicit resources for --gpu parameter
RELION_COMMAND_GPU=`echo $RELION_COMMAND|sed "s/--gpu[^-]*/--gpu $GPU_RESOURCES /"`
mpirun -n XXXmpinodesXXX $RELION_COMMAND_GPU
This obviously doesn't work for autopick as the second command is never executed.
Is there a way to distinguish the first autopick command (which should be run with mpirun and have GPU resources) from the second command (which only needs to be run on the master and doesn't need GPU resources) within the submission template? This would make it much easier to write a generic and portable template script.
Original comment bySjors Scheres (Bitbucket: scheres, GitHub: scheres):
Thanks for that! I've just commited a change that will not put the mpirun before any command that is not a relion _mpi call. This will soon be incorporated into beta
Originally reported by: Andreas Schenk (Bitbucket: andschenk, GitHub: andschenk)
Hi,
we ran into an issue with submitting autopick jobs to a cluster environment. The problem appears because the autopick job consists of two separate commands. When submitting a job to a cluster environment using a submission script template, Relion2 currently copies the line containing XXXcommandXXX and uses it for all the commands. E.g. using the template:
leads to the submission script:
This is insofar problematic that now multiple processes try to write to the same file at the same time, which might or might not work, depending on the file system involved.
A second problem arises on our cluster when using the GPU version of autopick. The cluster scheduler doesn't necessarily assign all GPUs on a node to Relion. Therefore the GPU resources have to be explicitly set within the submission script. Currently I use the template:
which leads to the submission script:
This obviously doesn't work for autopick as the second command is never executed.
Is there a way to distinguish the first autopick command (which should be run with mpirun and have GPU resources) from the second command (which only needs to be run on the master and doesn't need GPU resources) within the submission template? This would make it much easier to write a generic and portable template script.