PacificBiosciences / FALCON-integrate

Mostly deprecated. See https://github.com/PacificBiosciences/FALCON_unzip/wiki/Binaries
https://github.com/PacificBiosciences/FALCON/wiki/Manual
31 stars 28 forks source link

Falcon assembly is incomplete #192

Open madhubioinfo opened 5 years ago

madhubioinfo commented 5 years ago

Hi, I am using the falcon assembler for assembling fungal genome and its genome size is around 120 Mb and it just produced the 184K of the p-contigs. Below is my config file

[General]
use_tmpdir = /home/madhu/FALCON-examples
pwatcher_type = blocking
job_type = string
job_queue = bash -C ${CMD}
job_queue = bash -C ${CMD} > ${STDOUT_FILE} 2> ${STDERR_FILE}

# list of files of the initial bas.h5 files
input_fofn = input.fofn
#input_fofn = preads.fofn

input_type = raw
#input_type = preads

# The length cutoff used for seed reads used for initial mapping
length_cutoff = 2000

# The length cutoff used for seed reads usef for pre-assembly
length_cutoff_pr = 8000

#jobqueue = production
sge_option_da =
sge_option_la =
sge_option_pda =
sge_option_pla =
sge_option_fc =
sge_option_cns =

pa_concurrent_jobs = 32
ovlp_concurrent_jobs = 32
pa_concurrent_jobs = 6
ovlp_concurrent_jobs = 6

pa_HPCdaligner_option =  -v -B4 -t16 -e.70 -l1000 -s1000
ovlp_HPCdaligner_option = -v -B4 -t32 -h60 -e.96 -l500 -s1000

pa_DBsplit_option = -x500 -s50
ovlp_DBsplit_option = -x500 -s50

falcon_sense_option = --output_multi --min_idt 0.70 --min_cov 4 --max_n_read 200 --n_core 6

overlap_filtering_setting = --max_diff 100 --max_cov 100 --min_cov 20 --bestn 10 --n_core 24

I checked into the 2-asm-falcon directory following is the log from fc_ovlp_to_graph.log no out edge 000118482:E

This is the tail part of the .err file

+ fc_ovlp_to_graph --min_len 12000 preads.ovl

real    0m0.612s
user    0m0.230s
sys     0m0.063s

# Given sg_edges_list, utg_data, ctg_paths, preads4falcon.fasta,
# write p_ctg.fa and a_ctg_all.fa,
# plus a_ctg_base.fa, p_ctg_tiling_path, a_ctg_tiling_path, a_ctg_base_tiling_path:
time fc_graph_to_contig
+ fc_graph_to_contig

real    0m5.134s
user    0m4.820s
sys     0m0.297s

rm -f ./preads4falcon.fasta
+ rm -f ./preads4falcon.fasta

# Given a_ctg_all.fa, write a_ctg.fa:
time fc_dedup_a_tigs
+ fc_dedup_a_tigs

real    0m0.135s
user    0m0.083s
sys     0m0.036s

touch falcon_asm_done
+ touch falcon_asm_done
2019-02-27 21:13:14,102 - root - DEBUG - CD: '/home/madhu/FALCON-examples/madhu/pypetmp//home/madhu/FALCON-examples/2-asm-falcon' -> '/home/madhu/FALCON-examples/2-asm-falcon'
2019-02-27 21:13:14,103 - root - INFO - rsync -av /home/madhu/FALCON-examples/madhu/pypetmp//home/madhu/FALCON-examples/2-asm-falcon/ /home/madhu/FALCON-examples/2-asm-falcon; rm -rf /home/madhu/FALCON-examples/madhu/pypetmp//home/madhu/FALCON-examples/2-asm-falcon
2019-02-27 21:13:14,122 - root - DEBUG - Checking existence of u'falcon_asm_done' with timeout=60
2019-02-27 21:13:14,123 - root - DEBUG - CD: '/home/madhu/FALCON-examples/2-asm-falcon' -> '/home/madhu/FALCON-examples/2-asm-falcon'

real    0m9.077s
user    0m35.143s
sys     0m3.640s
touch /home/madhu/FALCON-examples/2-asm-falcon/run.sh.done
+ touch /home/madhu/FALCON-examples/2-asm-falcon/run.sh.done
+ finish
+ echo 'finish code: 0'

Help me in solving this issue

pb-cdunn commented 5 years ago

Look into the 1-rawreads/report directory. The JSON file there can help you to understand what happened.

You might not have enough input. Typically we set the desired seed_coverage (e.g. 30) and set length_cutoff = -1, so the cutoff will be auto-calculated.

length_cutoff_pr is never auto-calculated, but 8000 seems kind of high. Not sure. You can examine the preads DB size.

Beyond that, you might need to tweak your daligner settings, though they look fine to me.

You'll need to look at files at various stages to figure out exactly where the size is too small. Knowing that, we might be able to offer more advice.