PacificBiosciences / FALCON_unzip

Making diploid assembly becomes common practice for genomic study
BSD 3-Clause Clear License
30 stars 18 forks source link

Are all the haplotigs in one contig correctly phased? #76

Open bostanict opened 7 years ago

bostanict commented 7 years ago

Hi fellow,

Reading the unzip paper, it seems that the unzip has an strategy to phase the alternative haplotigs in a correct haplotype according to each other within a primary contig, right?

In other words, within a primary contig, are all the associated haplotigs phased correctly relative to each other? So assuming P contig as Haplotype A, are all the haplotigs labeled as associated contigs really from Haplotype B? or they can be mixed up in some regions when the overlapping sequence is not divergent enough to make the fork and the following contigs after that are not 100% correctly phased?

Thanks,

gconcepcion commented 7 years ago

The haplotigs associated with each primary contig represent a mixture of the two haplotypes, and are not phased relative to each other. The haplotigs are basically structural variation that was found in the assembly graph, and subsequently "snipped out". In a denovo assembly context, there is not enough information present to properly phase unlinked contigs. Moreover, each primary contig is only "partially phased" meaning there are "contiguous phased blocks", but one contig may be comprised of multiple "phased blocks".

I suggest you read the supplementary methods of the falcon_unzip paper a little bit closer.

bostanict commented 7 years ago

Dear Greg,

Thanks for the comment.

Best,