PacificBiosciences / FALCON

FALCON: experimental PacBio diploid assembler -- Out-of-date -- Please use a binary release: https://github.com/PacificBiosciences/FALCON_unzip/wiki/Binaries
https://github.com/PacificBiosciences/FALCON_unzip/wiki/Binaries
Other
205 stars 102 forks source link

FALCON vs HGAP > Celera > Quiver #20

Closed lennythomas closed 9 years ago

lennythomas commented 9 years ago

I am just testing out FALCON and was wondering what the primary benefits are compared to the standard workflow of HGAP > Celera > Quiver. Is there an application where one would be better than another and if I run FALCON do I still need to follow it up with Celera and Quiver? Thanks for any insights you can provide.

pb-jchin commented 9 years ago

Hi, lennythomas:

The FALCON is meant to be an open source experimental assembler for assembling PacBio long reads. Many components are "experimental" for exploring the best way to do large genome assembly. The overall method used in FALCON follows the method described in http://www.ncbi.nlm.nih.gov/pubmed/23644548 but all components are different now. Since the code base is small, it allows some experiments to be done fast. For example, I have changed the overlappers several times during the course of the development. It will be settled down with Gene Myers' Daligner as the main overlapping engine for a while. It is less trivial to incorporate different component under the infrastructure of Celera Assembly. (Yes I tried, but it was not easy.)

Given the fast status of the technology development, I hope FALCON providing a simple framework to explore for more efficient and new algorithms for doing genome assembly with long reads along the path of the technology development.

No. Currently Falcon does not include anything for the Quiver step now. It can be done as doing re-sequencing consensus construction for the generated contigs.

JC

lennythomas commented 9 years ago

Thanks pb-jchin! I was wondering if there was any value to taking the p_ctg.fa file and pushing that through Celera as a follow up step, but it sounds like the answer is probably no. This may be a dumb question, but can you point me to an example of how I would call Quiver to do that last step?

Is there any intent to release the source for HGAP that is included in SMRT Analysis for those who want to leverage a more production level product in their workflow?

pb-jchin commented 9 years ago

the p_ctg.fa contained already assembled contigs, so it may not be idea to push it again with Celera Assembler to do the assembly again. However, you could consider running some scaffolding software, e.g. SSpace or FinishSC, using the existing PacBio reads. I have tried a test using only 30x Dmel data. First, I assembled the reads using Falcon and then using SSpace to get scaffolds. This will work well for low coverage case (but be aware there could be still some mis-assemblies.) For coverage case, the improvement might not as big.

The source code for SMRT Analysis is in principle open, although I also need to poke around to find where it is now.

lennythomas commented 9 years ago

Thanks. I will give the Space / FinishSC a shot. I can download SMRT analysis, but it only has the exe for HGAP, can't find the source itself. If you can find a link, I'd appreciate it.

Also if you can provide that example of how you would run Quiver as a follow up step that would be extremely helpful. I though Quiver required base qualities which you don't have in a FASTA file.

pb-jchin commented 9 years ago

Currently, we suggest to run Quvier through SMRT Analysis v2.3.0. There is a bam_resequencing (or something like that) protocol which will propogate the QV from bax.h5 to bam files after initial alignment and running quiver consensus on top of it. Please follow the guide line there.