PacificBiosciences / FALCON

FALCON: experimental PacBio diploid assembler -- Out-of-date -- Please use a binary release: https://github.com/PacificBiosciences/FALCON_unzip/wiki/Binaries
https://github.com/PacificBiosciences/FALCON_unzip/wiki/Binaries
Other
204 stars 103 forks source link

SLURM parameters to submit falcon assembly #691

Closed scauet closed 4 years ago

scauet commented 4 years ago

Hello, I try to submit a falcon assembly on a slurm cluster and I have some problem to change the default values of memory and cpu.

I use these parameters in my config file:

[job.defaults]
job_type=slurm
pwatcher_type=blocking
MB=**_1000_**
NPROC=1
njobs=16
submit = srun --wait=0 -p workq -J ${JOB_NAME} -o ${JOB_STDOUT} -e ${JOB_STDERR}  --mem-per-cpu=${MB}M --cpus-per-task=${NPROC} ${JOB_SCRIPT}

That seems OK in the log file :

  "job.defaults": {
    "MB": "**_1000_**",
    "NPROC": "1",
    "job_type": "slurm",
    "njobs": "16",
    "pwatcher_type": "blocking",
    "submit": "srun --wait=0 -p workq  \\\n-J ${JOB_NAME}             \\\n-o ${JOB_STDOUT}        \\\n-e ${JOB_STDERR}        \\\n--mem-per-cpu=${MB}M     \\\n--cpus-per-task=${NPROC}     \\\n${JOB_SCRIPT}",
    "use_tmpdir": false
  },

But when the first job is submitted by falcon, it don't use these values as you can see in the log file :

[INFO]Popen: 'srun --wait=0 -p workq  \
-J P3b594088109b69             \
-o /work/project/gaia/Falcon/Mpu/0-rawreads/build/run-P3b594088109b69.bash.stdout        \
-e /work/project/gaia/Falcon/Mpu/0-rawreads/build/run-P3b594088109b69.bash.stderr        \
--mem-per-cpu=**_4000M_**     \
--cpus-per-task=1     \
/usr/local/bioinfo/src/Miniconda/Miniconda3-4.4.10/envs/pbbioconda_env/lib/python2.7/site-packages/pwatcher/mains/job_start.sh'
srun: error: Unable to create step for job 7314066: Memory required by task is not available
[ERROR]Task Node(0-rawreads/build) failed with exit-code=1
[ERROR]Some tasks are recently_done but not satisfied: set([Node(0-rawreads/build)])
...

My MB and NPROC aren't use. It's seem to always use default values (4000 MB and 1 NPROC).

Do you have any idea where the problem may come from? How can I force the use of a specific memory and processor?

Thanks for your help, Stephane

pb-cdunn commented 4 years ago

NPROC looks right to me. You're sure that's ignored?

We might be lacking in our support for MEM since we never use it, but NPROC should work. Usually you can fake MEM by choosing NPROC such that the machine has enough memory per processor.

scauet commented 4 years ago

Yes NPROC seems really ignored. I tried to put 8000Mb and 2cpu

[job.defaults]
job_type=slurm
pwatcher_type=blocking
MB=8000
NPROC=2
njobs=16
submit = srun --wait=0 -p workq -J ${JOB_NAME} -o ${JOB_STDOUT} -e ${JOB_STDERR}  --mem-per-cpu=${MB}M --cpus-per-task=${NPROC} ${JOB_SCRIPT}

but pwatcher continue to submit the jobs with 4000 MB and 1 NPROC :

2019-08-22 15:04:07,582 - pwatcher.blocking:227 - INFO - Popen: 'srun --wait=0 -p workq  \
-J P3b594088109b69             \
-o /work/project/gaia/Falcon/Mpu/0-rawreads/build/run-P3b594088109b69.bash.stdout        \
-e /work/project/gaia/Falcon/Mpu/0-rawreads/build/run-P3b594088109b69.bash.stderr        \
--mem-per-cpu=4000M     \
--cpus-per-task=1     \
/usr/local/bioinfo/src/Miniconda/Miniconda3-4.4.10/envs/pbbioconda_env/lib/python2.7/site-packages/pwatcher/mains/job_start.sh'

If I tried to hardcode the value in the submit parameters :

[job.defaults]
job_type=slurm
pwatcher_type=blocking
MB=1000
NPROC=1
njobs=16
submit = srun --wait=0 -p workq -J ${JOB_NAME} -o ${JOB_STDOUT} -e ${JOB_STDERR}  --mem-per-cpu=8000M --cpus-per-task=2 ${JOB_SCRIPT}

the job is submitted with 8000Mb et 2cpu

2019-08-22 15:12:40,322 - pwatcher.blocking:227 - INFO - Popen: 'srun --wait=0 -p workq  \
-J P3b594088109b69             \
-o /work/project/gaia/Falcon/Mpu/0-rawreads/build/run-P3b594088109b69.bash.stdout        \
-e /work/project/gaia/Falcon/Mpu/0-rawreads/build/run-P3b594088109b69.bash.stderr        \
--mem-per-cpu=8000M     \
--cpus-per-task=2     \

But I can't use different values in the step section if I hardcode the values in submit and it's not very elegant.

[job.step.da]
[job.step.la]
[job.step.cns]
[job.step.pda]
[job.step.pla]
[job.step.asm]

Do you have another suggestion ?

pb-cdunn commented 4 years ago

Some tasks override NPROC to 1 because they know that's all they need.

We have a hole in our support for ad hoc MEM. Unfortunately all you can do is hard-code it on your submit line. But you could also just not provide MEM and instead increase the number of CPUs requested, if you have consistent MEM/CPU on your hardware.