PacificBiosciences / FALCON_unzip

Making diploid assembly becomes common practice for genomic study
BSD 3-Clause Clear License
30 stars 18 forks source link

a question on FALCON_unzip primary and haplotigs explaination #96

Closed wsuplantpathology closed 6 years ago

wsuplantpathology commented 7 years ago

Hi @pb-jchin :+1:

Thanks for this nice assembler. I have a question on output primary contig and haplotigs. My fungus is highly heterogyzous, ~10 SNPs/kb. Here is some general summary of my assembly, 156 primary contigs (83Mb), and 475 haplotigs (73Mb).

From my results I see each primary contig has several haplotigs, with some primary contig regions do not have any associated haplotigs. From my reading my understanding is, the associated haplotigs are phased regions of primary contig, right?

2) The primary contig regions without any associated haplotigs are highly homozygous regions, and only heterozygous regions are present in the haplotigs , right?

3) If I map Illumina reads of same sample to primary contigs, I wonder could I tell a region is phased or unphased in primary contig through mapping coverage? My thinking is not, because the mapping coverage for phased region and unphased region should be same, right?

Thanks so much in advance.

Best, Chongjing

cschin commented 7 years ago

@wsuplantpathology the degree of heterozygosity can be quite different form region to region in a genome.

"The primary contig regions without any associated haplotigs": such primary config can be from (1) highly homozygous regions. The assembler was not able to find two distinct haplotypes so both haplotypes are represented by one primary config. Or, from (2), highly heterozygous region, the two haplotypes are very distinct and the assembler similar process them separately.

If you have a right mapping process (from PBI reads to the contigs or Illumina reads to the contigs.), you might be able to make conclusion which about scenario happens. There are a couple other simple techniques with whole genome alignments. Please ping PacBio's experts on this more.

BenjaminSchwessinger commented 7 years ago

Apologies for chipping in and slight cross posting. @wsuplantpathology we looked at this point a bit more in detail in our rust fungus and have some pretty simple approaches using mapping coverage described in our github repo and our preprint. get in touch if you want some more background on it.

Overall, we follow @cschin advice looking at coverage and whole genome alignments to figure unphased regions of the genome.

cschin commented 7 years ago

@BenjaminSchwessinger 👍

wsuplantpathology commented 7 years ago

Thanks so much for your comments @cschin, and your @BenjaminSchwessinger excellent work on stripe rust fungus. I'll get in touch with you since we are also working on Puccinia striiformis genomes and got some interesting findings.

Best, Chongjing Xia