evotools / nf-LO

A Nextflow workflow to generate lift over files for any pair of genomes
https://nf-lo.readthedocs.io/
MIT License
53 stars 10 forks source link

Process requirement exceed available CPUs -- req: 2; avail: 1 #1

Closed smk5g5 closed 2 years ago

smk5g5 commented 3 years ago

Hi I am trying to use nf-LO pipeline and I am running into some errors. On my server I had specified the nflo pipeline to use 8 cpus and 32gb of RAM but it fails with the error " Process requirement exceed available CPUs -- req: 2; avail: 1" I was wondering why is that when I am giving it 8 cpus in both my lsf command and in the nextflow command.

nextflow run evotools/nf-LO \ --source /scratch1/fs1/allegra.petti/khan.saad/Glass_synapse/reference/human_g1k_v37.fasta \ --target /scratch1/fs1/allegra.petti/khan.saad/liftover_chain/hg19_igenomes/genome.fa \ --aligner lastz \ --tgtSize 10000000 \ --tgtOvlp 100000 \ --srcSize 20000000 \ --liftover_algorithm crossmap \ --outdir ./my_liftover \ --publish_dir_mode copy \ --max_cpus 8 \ --max_memory 120.GB \

JAVA_HOME='/venv' export PATH=/opt/conda/bin:$PATH PATH=/venv/bin:$PATH LSF_DOCKER_VOLUMES='/home/khan.saad/:/home/khan.saad/ /storage1/fs1/allegra.petti/Active/:/storage1/fs1/al legra.petti/Active/ /scratch1/fs1/allegra.petti/:/scratch1/fs1/allegra.petti/' bsub -oo logs/nexflow_nflo.%J -G compute-allegra .petti -g /khan.saad/R_seurat -q general -M 128000000 -n 8 -R 'rusage[mem=128000]' -a 'docker(smk5g5/nf-lo:1.0.0)' /bin/bash ./ nflo.sh

Below is the error I am getting.

Command output: (empty)

Work dir: /scratch1/fs1/allegra.petti/khan.saad/liftover_chain/work/ae/5dbc7b0dff31f4dbfda486663a4019

Tip: you can replicate the issue by changing to the process work dir and entering the command bash .command.run

WARN: Killing pending tasks (1)

executor > local (7044) [99/b8dfba] process > PREPROC:src2bit (src2Bit) [100%] 1 of 1 ✔ [77/b6fe1e] process > PREPROC:tgt2bit (tgt2Bit) [100%] 1 of 1 ✔ [87/500b9a] process > PREPROC:splitsrc (splitsrc) [100%] 1 of 1 ✔ [5d/0611b0] process > PREPROC:groupsrc (groupsrc) [100%] 1 of 1 ✔ [ca/a47e09] process > PREPROC:splittgt (splittgt) [100%] 1 of 1 ✔ [c4/622932] process > PREPROC:grouptgt (grouptgt) [100%] 1 of 1 ✔ [03/40d546] process > PREPROC:pairs (mkpairs) [100%] 1 of 1 ✔ [e1/c85215] process > ALIGNER:lastz (lastz_med.sr... [ 15%] 7009 of 46610, fa... [8f/e0eb1a] process > ALIGNER:axtChain (axtchain_m) [ 0%] 28 of 7007 [- ] process > ALIGNER:chainMerge - [- ] process > ALIGNER:chainNet - [- ] process > ALIGNER:netSynt - [- ] process > ALIGNER:chainsubset - [- ] process > ALIGNER:chain2maf - [- ] process > ALIGNER:name_maf_seq - [- ] process > ALIGNER:mafstats - [d1/3b95a6] NOTE: Process ALIGNER:lastz (lastz_med.src93.tgt68) terminated with an error exit status (1) -- Execution is retried (1) Error executing process > 'ALIGNER:lastz (lastz_med.src93.tgt68)'

Caused by: Process requirement exceed available CPUs -- req: 2; avail: 1

Command executed:

echo B=0 C=0 E=30 H=0 K=3000 L=3000 M=50 O=400 T=1 Y=9400 lastz /scratch1/fs1/allegra.petti/khan.saad/liftover_chain/work/5d/0611b088a66d4b63a1adb6961c96d7/CLUST_src/src93.fa /scratch1/fs1/allegra.petti/khan.saad/liftover_chain/work/c4/622932c827e301415a2268ee960622/CLUST_tgt/tgt68.fa B=0 C=0 E=30 H=0 K=3000 L=3000 M=50 O=400 T=1 Y=9400 --ambiguous=iupac --format=lav | lavToPsl stdin stdout | liftUp -type=.psl stdout source.lift warn stdin | liftUp -type=.psl -pslQ src93.tgt68.psl target.lift warn stdin

Command exit status:

Command output: (empty)

Work dir: /scratch1/fs1/allegra.petti/khan.saad/liftover_chain/work/ae/5dbc7b0dff31f4dbfda486663a4019

RenzoTale88 commented 3 years ago

@smk5g5 thank you for using nf-LO! I'm currently trying to reproduce the error you're encountering in our system. I'll let you know as soon as I have a solution. Have you tried resuming the run with -resume? Best Andrea

smk5g5 commented 3 years ago

@RenzoTale88 I have tried -resume but it did not help solve my problem. I am still getting the error at this step

Caused by: Process requirement exceed available CPUs -- req: 2; avail: 1

Command executed:

mkdir SPLIT_tgt && chmod a+rw SPLIT_tgt faSplit size -lift=target.lift -extra=100000 genome.fa 10000000 SPLIT_tgt/

RenzoTale88 commented 3 years ago

@smk5g5 thank you for confirming this. I'll try to run some tests on our systems. In the meanwhile, I can only suggest to try reduce the number of core to N-1, where N is the maximum available on your system. I'm not convinced this will sort the issue, but I suppose it might be worth a shot. Also, keep in mind that --max_cpus refers to the number of cpus that you want a single process will use at the time (analogously to the nf-core workflows ). I know these are not fixes, and I apologise for the inconvenience. I'll keep you updated when I will upload a patch.

Andrea

smk5g5 commented 3 years ago

I had already tried changing the --max_cpus argument to 1 but it did not work. Is there a version of this submitting each step of the pipeline as a bsub using the LSF executor since nextflow is compatible with lsf?

RenzoTale88 commented 3 years ago

Nextflow does support LSF as a scheduler, so in principle it is doable.
You can see if you can find any custom configuration here. If you find any, you can call these using the -profile CONFIGURATION_NAME. Alternatively, we can define a configuration specific for your cluster, which will allow the workflow to automatically submit and take care of the jobs remotely.

RenzoTale88 commented 3 years ago

@smk5g5 thank you for your patience. I've tried to reproduce the issue without success.

Nevertheless, I've just uploaded a new set of configurations that I hope will help preventing the problem. In particular, the workflow now will always use one core for jobs that are incapable of taking advantage of multiple core available (e.g. lastz and some post-processing jobs). In addition, I've modified the configuration for local runs with 1) reducing the spawning of new jobs in intervals of 0.25 seconds between submissions and 2) reducing the queueSize to the number of cores - 1 when more than one core is available.

Regarding your request of last week, you can specify a custom configuration through the -c MYPROFILE flag, or alternatively using the --my_profile MYPROFILE -profile local,custom_profile set of options, where MYPROFILE is your configuration file.

Hope these will help sorting out the issue!

smk5g5 commented 3 years ago

Now I am getting a new error.

N E X T F L O W  ~  version 21.04.0
Pulling evotools/nf-LO ...
 downloaded from https://github.com/evotools/nf-LO.git
Launching `evotools/nf-LO` [focused_church] - revision: 0d144f8285 [main]
WARN: It appears you have never run this project before -- Option `-resume` is ignored

=====================================
         __           _      ____  
        / _|         | |    / __ \ 
  _ __ | |_   ______ | |   | |  | |
 | '_ \|  _| |_____| | |   | |  | |
 | | | | |           | |___| |__| |
 |_| |_|_|           |______\____/ 
=====================================
Nextflow LiftOver v 1.6.0
=====================================
source          : /storage1/fs1/tahan/Active/projects/saad/hs37d5.fa
target          : /storage1/fs1/tahan/Active/projects/saad/genome.fa
aligner         : lastz
distance        : medium
custom align    : false
custom chain    : false
source chunk    : 20000000
source overlap  : 0
target chunk    : 10000000
target overlap  : 100000
output folder   : /scratch1/fs1/tahan/my_liftover
liftover name   : liftover
annot           : false
annot type      : false
liftover meth.  : crossmap
igenomes_base   : s3://ngi-igenomes/igenomes/
igenomes_ignore : false
no_maf          : false
skip netsynt    : false
max cpu         : 20
max mem         : 200.GB
max rt          : 240.h
Using CrossMap
Invalid submit-rate-limit value: 0.25 sec -- It must be provide using the following format `num request / duration` eg. 10/1s

Below is the command used.

JAVA_HOME='/venv' PATH=/opt/conda/bin:/venv/bin:$PATH LSF_DOCKER_VOLUMES="/storage1/fs1/tahan/Active/:/storage1/fs1/tahan/Active/ /scratch1/fs1/tahan:/scratch1/fs1/tahan $HOME:$HOME" bsub -G compute-ris -o $HOME/nextflow.log -q general -M 216GB -n 20 -R "rusage[mem=216GB] span[hosts=1]" -a "docker(smk5g5/nf-lo:1.0.0)" nextflow run evotools/nf-LO -w /scratch1/fs1/tahan/nflow -resume --source /storage1/fs1/tahan/Active/projects/saad/hs37d5.fa --target /storage1/fs1/tahan/Active/projects/saad/genome.fa --aligner lastz --tgtSize 10000000 --tgtOvlp 100000 --srcSize 20000000 --liftover_algorithm crossmap --outdir /scratch1/fs1/tahan/my_liftover --publish_dir_mode copy --max_cpus 20 --max_memory 200.GB
RenzoTale88 commented 3 years ago

@smk5g5 apologies for this, I've corrected the submission rate in the correct syntax. Could you please try the code now? Thank you Andrea

RenzoTale88 commented 2 years ago

Closing due to inactivity.