PacificBiosciences / pbbioconda

PacBio Secondary Analysis Tools on Bioconda. Contains list of PacBio packages available via conda.
BSD 3-Clause Clear License
247 stars 43 forks source link

FALCON resource allocation #33

Closed gbdias closed 5 years ago

gbdias commented 5 years ago

Hello,

I have read several threads on this but I am still struggling with resource allocation on the FALCON config file for PB-assembly.

  1. First, how do the number of jobs and processors in the [job.defaults] section correlates with the --n_core parameter in the falcon_sense_option and overlap_filtering_setting?

  2. Second, what happens if I do not set a limit memory per processor using the MB = parameter?

[General]
# list of files of the initial bas.h5 files
input_fofn = input.fofn

input_type = raw

# The length cutoff used for seed reads used for initial mapping
length_cutoff = -1
genome_size = 14200000
seed_coverage = 30

# The length cutoff used for seed reads usef for pre-assembly
length_cutoff_pr = 1000

pa_daligner_option   = -e.70 -l2000 -k18 -h240 -w8 -s100
ovlp_daligner_option = -e.96 -l1000 -k24 -h240 -w6 -s100
pa_HPCdaligner_option   = -v -B128 -M48
ovlp_HPCdaligner_option = -v -B128 -M48

pa_DBsplit_option = -x500 -s400
ovlp_DBsplit_option = -s400

falcon_sense_option = --output_multi --min_idt 0.70 --min_cov 2 --max_n_read 200 --n_core 12
falcon_sense_skip_contained = True

overlap_filtering_setting = --max-diff 100 --max-cov 100 --min-cov 2 --n-core 24

[job.defaults]
job_type = local

use_tmpdir = /lscratch
pwatcher_type = blocking
job_type = string
submit = bash -C ${CMD} >| ${STDOUT_FILE} 2>| ${STDERR_FILE}

NPROC = 48
njobs = 1
MB = 50000
[job.step.da]
[job.step.pda]
[job.step.la]
[job.step.pla]
[job.step.cns]
[job.step.asm]
xuzhougeng commented 5 years ago

If you can read Chinese, I write a tutorial about how to run Falcon locally on the E. coli databases in https://www.jianshu.com/p/2872cc26c49a.

For me, I have 96 cpu and 512 GB, and my cfg is

...
falcon_sense_option=--output-multi --min-idt 0.70 --min-cov 4 --max-n-read 200
...
overlap_filtering_setting=--max-diff 100 --max-cov 150 --min-cov 2
...
[job.defaults]
job_type=local
pwatcher_type=blocking
JOB_QUEUE=default
MB=32768
NPROC=6
njobs=40
submit = /bin/bash -c "${JOB_SCRIPT}" > "${JOB_STDOUT}" 2> "${JOB_STDERR}"

[job.step.da]
NPROC=4
MB=32768
njobs=20
[job.step.la]
NPROC=4
MB=16384
njobs=30
[job.step.cns]
NPROC=4
MB=65536
njobs=25
##40 * 4 need more than 512 G memoery
[job.step.pda]
NPROC=4
MB=32768
njobs=15
[job.step.pla]
NPROC=4
MB=16384
njobs=30
[job.step.asm]
NPROC=50
MB=196608
njobs=1

The threads will be used in "Pre-assembly" is 25(njobs) x 4(NPROC)=100

gbdias commented 5 years ago

Hi @xuzhougeng, Thanks for your post. So, you do not provide the --n_core parameter for the falcon_sense_option and overlap_filtering_setting?

xuzhougeng commented 5 years ago

@gbdias YES. Because I find that this option will be pass to falcon_sense_option through the setting of NPROC in [job.step.cns]. So, you don't need to provide --n_core in falcon_sense_option.

pb-cdunn commented 5 years ago

Most of this is fixed in the current release. Please re-open if still a problem.