PacificBiosciences / FALCON

FALCON: experimental PacBio diploid assembler -- Out-of-date -- Please use a binary release: https://github.com/PacificBiosciences/FALCON_unzip/wiki/Binaries
https://github.com/PacificBiosciences/FALCON_unzip/wiki/Binaries
Other
205 stars 102 forks source link

How to get more preads.fasta #362

Open tangerzhang opened 8 years ago

tangerzhang commented 8 years ago

Hello, I am using FALCON to assemble a tetraploid plant, whose genome size is 3.2 g. The raw data is around 240 G, which is 75 X of estimated genome size. After the correction step, I only got 36 G preads (11.25 X). Then preads that are longer than 1k were collected for assembly. The final assembly returned a 1.3G genome with N50 only 78 kb. I posted my configure file below and wondering is there any suggestion to improve it?

[General]
input_fofn = input.fofn
input_type = raw
length_cutoff = 1000
length_cutoff_pr = 1000 

sge_option_da = -pe orte 8 -q all.q -l mem_free=100g
sge_option_la = -pe orte 8 -q all.q  -l mem_free=100g
sge_option_pda = -pe orte 8 -q all.q -l mem_free=100g
sge_option_pla = -pe orte 8 -q all.q -l mem_free=100g
sge_option_fc = -pe orte 8 -q all.q -l mem_free=100g
sge_option_cns = -pe orte 8 -q all.q -l mem_free=100g

pa_concurrent_jobs = 60
cns_concurrent_jobs = 60
ovlp_concurrent_jobs = 60

pa_HPCdaligner_option = -M20 -dal128 -t18 -e0.75 -l100 -s500 -h1250
ovlp_HPCdaligner_option = -M20 -dal128 -t24 -e0.96 -l100 -s500 -h1250

pa_DBsplit_option = -x2500 -s200
ovlp_DBsplit_option = -x1500 -s200
falcon_sense_option = --output_multi --min_idt 0.70 --min_cov 5 --max_n_read 100 --n_core 6 
overlap_filtering_setting = --max_diff 100 --max_cov 80 --min_cov 5 --bestn 30 --n_core 24
pb-jchin commented 8 years ago

while it is possible, I am guessing 11x p-read is still lower than idea for 3G genome. In the mean time, you should try:

 --max_diff 100 --max_cov 80 --min_cov 1 --bestn 10 --n_core 24

for overlap_filtering_setting

with 11x p-read, you will only get very few p-reads that have both 5x overlap on 5'- and 3'- ends. Change that to 1x.

pb-jchin commented 8 years ago

We should change "--min_cov" to "--min_olvp" to avoid future confusion, assign to myself.

tangerzhang commented 8 years ago

Thanks for your suggestion. I will try it. One thing that I would like to make sure. You suggestion (--max_diff 100 --max_cov 80 --min_cov 1 --bestn 10 --n_core 24) is for assemble step rather than for correction step, right? So I just need re-run the assemble step with 11x p-reads. Which parameters are better to modify if I would like to get more p-reads? Thanks!