PacificBiosciences / FALCON

FALCON: experimental PacBio diploid assembler -- Out-of-date -- Please use a binary release: https://github.com/PacificBiosciences/FALCON_unzip/wiki/Binaries
https://github.com/PacificBiosciences/FALCON_unzip/wiki/Binaries
Other
205 stars 103 forks source link

Hybrid Assembly using falcon #282

Open prince26121991 opened 8 years ago

prince26121991 commented 8 years ago

I want to ask if I have 20X illumina corrected pacbio reads, Should I use falcon for overlap graph construction? Because https://github.com/PacificBiosciences/Bioinformatics-Training/wiki/Large-Genome-Assembly-with-PacBio-Long-Reads says that it is for PB Only reads. And can you provide a resource or publication for FALCON where I can understand what it does in background https://github.com/PacificBiosciences/FALCON/wiki/Manual This page focus on technical detail rather than theoretical detail.

pb-jchin commented 8 years ago

Hi, @prince26121991, the general principle of FALCON design follows this publication http://www.nature.com/nmeth/journal/v10/n6/full/nmeth.2474.html You can also check https://speakerdeck.com/jchin/string-graph-assembly-for-diploid-genomes-with-long-reads and https://speakerdeck.com/jchin/de-novo-diploid-genome-assembly-and-haplotype-sequence-reconstruction (FALCON itself does not separate the haplotypes though)

prince26121991 commented 8 years ago

I already had seen these publications sir, but earlier when I worked with HGAP I had 20X coverage for Raw reads and at Celera step I had 8X coverage remained after correction which was not sufficient for Celera later on I included illumina reads and increased overall coverage of PB raw reads also, so Now I have 70X short reads to correct 42X PB reads which became 20X after hybrid error correction Now I want to ask If I can use Falcon on those 20X corrected pacbio reads.....

mseetin commented 8 years ago

With 42x PB coverage, that's good enough for at least a decent Falcon assembly alone. I'd set your length_cutoff equal to the length above which you have 30x coverage and just do a PacBio only calculation.

On Wed, Feb 17, 2016 at 9:06 AM, prince26121991 notifications@github.com wrote:

I already had seen these publications sir, but earlier when I worked with HGAP I had 20X coverage for Raw reads and at Celera step I had 8X coverage remained after correction which was not sufficient for Celera later on I included illumina reads and increased overall coverage of PB raw reads also, so Now I have 70X short reads to correct 42X PB reads which became 20X after hybrid error correction Now I want to ask If I can use Falcon on those 20X corrected pacbio reads.....

— Reply to this email directly or view it on GitHub https://github.com/PacificBiosciences/FALCON/issues/282#issuecomment-185301107 .

pb-jchin commented 8 years ago

@prince26121991 FALCON is not designed to handle hybrid corrected reads. While it works for some error corrected reads by setting the input type as "preads", you need to be careful. The later stage of FALCON has no explicit chimer or artifact removal mechanism. (In the FALCON design, we push all those machinery into the earlier error correction stage.) It works if the error corrected is "correct". We can not be 100% sure what kind of artifacts in the hybrid corrected reads. So, you can try, but it is hard to say what might happen. To some degree, Calera Assembler (or Canu) will handle the hybrid reads better as there is still some artifact removal stage there.

prince26121991 commented 8 years ago

@pb-jchin Do you agree with @mseetin ? 42X is enough for a decent falcon assembly, It's a diploid genome and Highly repetitive...

pb-cdunn commented 8 years ago

Whether 42x is enough will depend on the raw accuracy and the genome-size. It's worth trying.

With the latest FALCON, you can set:

The length_cutoff will then be calculated for you at runtime.

This feature is not yet documented, but we use it within PacBio regularly.

ademcan commented 7 years ago

Thank you for the additional information about the settings. In this case, should one also use length_cutoff_pr = -1 ?

pb-cdunn commented 7 years ago

No. We don't auto-calculate length_cutoff_pr yet, but you can choose something conservative, at the short end.