Closed wsuplantpathology closed 6 years ago
@wsuplantpathology the degree of heterozygosity can be quite different form region to region in a genome.
"The primary contig regions without any associated haplotigs": such primary config can be from (1) highly homozygous regions. The assembler was not able to find two distinct haplotypes so both haplotypes are represented by one primary config. Or, from (2), highly heterozygous region, the two haplotypes are very distinct and the assembler similar process them separately.
If you have a right mapping process (from PBI reads to the contigs or Illumina reads to the contigs.), you might be able to make conclusion which about scenario happens. There are a couple other simple techniques with whole genome alignments. Please ping PacBio's experts on this more.
Apologies for chipping in and slight cross posting. @wsuplantpathology we looked at this point a bit more in detail in our rust fungus and have some pretty simple approaches using mapping coverage described in our github repo and our preprint. get in touch if you want some more background on it.
Overall, we follow @cschin advice looking at coverage and whole genome alignments to figure unphased regions of the genome.
@BenjaminSchwessinger 👍
Thanks so much for your comments @cschin, and your @BenjaminSchwessinger excellent work on stripe rust fungus. I'll get in touch with you since we are also working on Puccinia striiformis genomes and got some interesting findings.
Best, Chongjing Xia
Hi @pb-jchin :+1:
Thanks for this nice assembler. I have a question on output primary contig and haplotigs. My fungus is highly heterogyzous, ~10 SNPs/kb. Here is some general summary of my assembly, 156 primary contigs (83Mb), and 475 haplotigs (73Mb).
From my results I see each primary contig has several haplotigs, with some primary contig regions do not have any associated haplotigs. From my reading my understanding is, the associated haplotigs are phased regions of primary contig, right?
2) The primary contig regions without any associated haplotigs are highly homozygous regions, and only heterozygous regions are present in the haplotigs , right?
3) If I map Illumina reads of same sample to primary contigs, I wonder could I tell a region is phased or unphased in primary contig through mapping coverage? My thinking is not, because the mapping coverage for phased region and unphased region should be same, right?
Thanks so much in advance.
Best, Chongjing