PacificBiosciences / FALCON

FALCON: experimental PacBio diploid assembler -- Out-of-date -- Please use a binary release: https://github.com/PacificBiosciences/FALCON_unzip/wiki/Binaries
https://github.com/PacificBiosciences/FALCON_unzip/wiki/Binaries
Other
205 stars 103 forks source link

Running Falcon locally, with large genomes #663

Closed bernardo1963 closed 6 years ago

bernardo1963 commented 6 years ago

I have a local server with 208 CPUs and 2 TB of RAM, and I am using it to assemble several PacBio datasets. It assembled a Drosophila genome in 2 hours, but I noticed that it is launching 120 daligner processes at the same time, whereas I include in the cfg the following parameters to limit the number of concurrent processes to 48:

[job.step.da]
njobs = 48
NPROC=4

How can I limit the number of simultaneous daligner processes? The same problem is happening with the consensus step (I limited to 32, but there are 175 running).
The fly genome finished normally (although it used almost 90% of the 2TB memory), but when I tried a human data set (the CHM1 mole) the server crashed.

Also, in general, would you have a suggestion of a cfg file suitable to assemble a human genome? Thanks in advance, Bernardo

The cfg file is copied below:

job_type = local
input_fofn = input.fofn
input_type = raw
length_cutoff = 18000
length_cutoff_pr = 18000

#New-style config
[job.defaults]
pwatcher_type = blocking
submit = /bin/bash -c "${JOB_SCRIPT}" > "${JOB_STDOUT}" 2> "${JOB_STDERR}"

[General]
pwatcher_type = blocking
# Because of a bug, this is needed in the "General" section, but soon
# it will work from the "job.defaults" too, which is preferred.

pa_HPCdaligner_option =  -v  -h35 -k16 -e.70 -M24 -l1000 -s1000 
ovlp_HPCdaligner_option = -v  -h110 -k22 -e.96 -M24 -l500 -s1000

pa_DBsplit_option = -x500 -s400
ovlp_DBsplit_option = -x500 -s400
falcon_sense_option = --output-multi --min-idt 0.70 --min-cov 4 --max-n-read 200 --n-core 6
overlap_filtering_setting = --max-diff 30 --max-cov 60 --min-cov 5 --n-core 24
LA4Falcon_preload = false

[job.step.da]
njobs = 48
NPROC=4
[job.step.pda]
njobs = 48
NPROC=4
[job.step.la]
njobs = 48
NPROC=4
[job.step.pla]
njobs = 48
NPROC=4
[job.step.cns]
njobs = 32
NPROC=6
[job.step.asm]
njobs = 8
NPROC=24
pb-cdunn commented 6 years ago

We have a bug in the applying of njobs. You may need more recent code. Do you know what SHA1? Or at least what date/version of Falcon?

Watch for lines like Setting max_jobs to {}; was {}. Those should tell you what njobs is used for each set of qsub calls. If it's wrong, we may have a bug, or you may have a typo. If you don't see those informational lines at all, your code is too old.

JEzpeleta commented 6 years ago

Chris, thank you for your answer. I am following up on this on behalf of @bernardo1963 . We installed Falcon in early July, and the master branch hasn't been modified since April, so I initially figured that would be the latest code, but we weren't seeing the informational lines you mentioned, so I guess that code is too old. The develop branch is slightly more updated (last updated in May) and, as it turns out, it does solve the njobs issue and correctly display the informational lines, but we would really like to work with a stable version if at all possible. In connection with this, I recently noticed you are now providing binaries and these appear to be more up-to-date than the github repository, but unfortunately we need access to the source for some of the work we are doing. Is there any chance you can provide the source code for the latest versions (either via github or otherwise)? That would be immensely helpful.

JEzpeleta commented 6 years ago

Just to clarify and expand the above: We can obviously modify the python scripts on the packages directly, but we are also trying to modify daligner, so what we would ideally need is the source code for the specific version of daligner used in the latest stable falcon package (so that we can make changes, recompile daligner and then replace the package binaries with the our updated version). If you are just using one of the daligner versions available on github, then a commit number should be enough for our needs.

pb-cdunn commented 6 years ago

Please try the latest tarball from https://github.com/PacificBiosciences/FALCON_unzip/wiki/Binaries

JEzpeleta commented 6 years ago

Thank you, we will give that a try. Is the version of daligner available on github compatible with that tarball? I.e. if I replace the daligner binary on that tarball with one built from scratch (say commit fd21879), would you expect it to work fine?

pb-cdunn commented 6 years ago

Partly. We add some executables in our fork of DALIGNER, so you need to use our version. It's in the binary release, and it's probably up-to-date at github.com/PacificBiosciences/DALIGNER.

JEzpeleta commented 6 years ago

Thank you once again for the help. I will give this a try.