PacificBiosciences / FALCON

FALCON: experimental PacBio diploid assembler -- Out-of-date -- Please use a binary release: https://github.com/PacificBiosciences/FALCON_unzip/wiki/Binaries
https://github.com/PacificBiosciences/FALCON_unzip/wiki/Binaries
Other
205 stars 103 forks source link

Is preads4falcon.fasta the error corrected reads? #705

Closed aboyher closed 4 years ago

aboyher commented 4 years ago

Does preads4falcon.fasta contain the error corrected reads? I used 7000 as my length cutoff and the min length of preads4falcon.fasta is 500. I want to realign the preads to my final assembly after scaffolding.

zeeev commented 4 years ago

Hi @aboyher,

Can you post your config file?

aboyher commented 4 years ago

Hi Zeeev! Here's the config file. My colleague ran this some 3-4 years ago.

[General]

job_type = local

# list of files of the initial bas.h5 files
input_fofn = input.fofn
#input_fofn = preads.fofn

input_type = raw
#input_type = preads

# The length cutoff used for seed reads used for initial mapping
length_cutoff = 5000

# The length cutoff used for seed reads use for pre-assembly
length_cutoff_pr = 7000

sge_option_da = -pe smp 2 -q all.q
sge_option_la = -pe smp 2 -q all.q
sge_option_pda = -pe smp 2 -q all.q
sge_option_pla = -pe smp 2 -q all.q
sge_option_fc = -pe smp 24 -q all.q
sge_option_cns = -pe smp 8 -q all.q

pa_concurrent_jobs = 20
ovlp_concurrent_jobs = 20
cns_concurrent_jobs = 20

# min read length can be set with the -l option
pa_HPCdaligner_option =  -v -dal32 -t16 -h240 -H5000 -e.70 -l2500 -s100 -M15
ovlp_HPCdaligner_option = -v -dal32 -t32 -h240 -H6000 -e.96 -l2500 -s100 -M15

# set -sX to define block sizes of X Mb
pa_DBsplit_option = -x500 -s350
ovlp_DBsplit_option = -x500 -s350

falcon_sense_option = --output_multi --min_idt 0.70 --min_cov 4 --local_match_count_threshold 2 --max_n_read 200 --n_core 6

overlap_filtering_setting = --max_diff 70 --max_cov 90 --min_cov 4 --bestn 10
aboyher commented 4 years ago

Am i right that the preads4falcon.fasta file is supposed to be the error corrected reads?

zeeev commented 4 years ago

Hi @aboyher,

If the input type is raw, then the preads4falcon.fasta contains the error corrected reads.

isovic commented 4 years ago

Hi @aboyher,

The length_cutoff_pr does not get rid of the actual preads, they are still there. Instead, this threshold is applied only during the overlap filtering phase, so that the overlaps with short reads do not enter the layout stage.

To force Falcon to filter short preads, you can attempt to increase the threshold for DBsplit option from 500 to your desired value (the -x parameter), e.g.:

ovlp_DBsplit_option = -x7000 -s350

Best regards, Ivan.

aboyher commented 4 years ago

But these are error corrected, so if I just want to use the preads for something else, I can just remove the shorter reads, correct?

zeeev commented 4 years ago

Hi @aboyher,

You're correct, you can filter the error corrected reads for any other application. However, It wouldn't be advised to resume a partial falcon run with the filtered reads.

Best wishes,

Zev

aboyher commented 4 years ago

Awesome. Yeah, I'm not planning on a partial falcon run. Thanks guys!

zeeev commented 4 years ago

@aboyher, great, thanks for using falcon. If you need to re-open the issue, please do so.