Open Lillian-21 opened 2 years ago
Does your sample just has one haplotype with 8x coverage?
Hi chhylp123,
How important is it and what are the difference phasing homozygous genome vs heterozygous genome?
I guess there is no need to phase homozygous genomes, as they only have one haplotype?
Should the homozygous genomes be purged? Without purging the genome size would have lots of duplicates.
Heterozygous: Genome size 600Mb, phasing gives hap1 ~ 300Mb hap2 ~ 300Mb, both are used for downstream analysis Homozygous: Genome size 600Mb, (if phase gives hap1 ~ 300Mb hap2 ~ 300Mb) Use 600Mb for downstream analysis??
Thanks
Just make sure: what are the differences between the homozygous genome and the heterozygous genome? If a genome is homozygous, hifiasm with -l0
should not produce assembly including lots of duplicates.
Property | min | max |
---|---|---|
Homozygous (aa) | 99.09% | 99.11% |
Heterozygous (ab) | 0.88% | 0.90% |
hifiasm -o sample.asm -l0 sample.fastq
.p_ctg.fasta + .a_ctg.fasta = .pa_ctg.fasta (600Mb) Genome size 600Mb,
if phase: hap1 ~ 300Mb hap2 ~ 300Mb.
What to use for downstream analysis?? 600Mb or ~300Mb?
Thanks
Could you please check the assembly graph of the homozygous genome? If the graph has a lot of small bubbles, it is more likely to be a heterozygous genome with low heterozygosity. In this case, I would recommend you to use phased assemblies.
This is how the assembly graph looks like.
I mean the p_utg.noseq.gfa
, which could be visualized by Bandage. But at least from your k-mer plot, it is a heterozygous genome.
Well, what's the estimated genome size by k-mers, and the BUSCO scores? I guess it should be a heterozygous genome with high heterozygosity. But there is a slight possibility that this genome is homozygous.
This is p_ctg.noseq.gfa
The estimated genome size is 301Mb.
With K-mer GEnomeeScope
Then I guess it is a heterozygous genome.
So i should not use -l0 rather use -l3
Yes, I guess phased assemblies should work. You could also compare the BUSCO scores for double checking.
Some confusion: Heterozygous species:
GenomeScope: Property | min |
---|---|
Homozygous (aa) | 96.44% |
Heterozygous (ab) | 3.51% |
Assembly graph
Homozygous species:
GenomeScope graph and table shows Homozygous:
Property | min |
---|---|
Homozygous (aa) | 99.09% |
Heterozygous (ab) | 0.88% |
Assembly graph
The look totally different.How to reliably confirm the heterozygosity and homozygocity?
Thanks
The reason is that GenomeScope and hifiasm utilize k-mers with different lengths. I guess most genomes are heterozygous unless some very special genomes. For these genomes, you should always know they are homozygous in advance.
I have a homozygous plant sepceis (estimatic size 1.1G). First, I run hifi read (33G) with hifiasm-0.16.1/hifiasm -o out.fasta -t 16 -l0 input.fasta. I get 1.8G p_ctg (N50 2.3M, not very good), 158M a_ctg. Then I used p_ctg to run Hi-C phasing: hifiasm-0.16.1/hifiasm -l0 -o p_ctg.fa -t 24 --h read1.fq.gz -h2 read2.fq.gz. I get 1.3G hic.p_ctg.fa; 1.3G hap1.p_ctg.fa; 1.2G hap2.p_ctg.fa. Now I do not know which one should I use to do next analysis? which one is the whole geneme? How to understand the hap1 and hap2? Besides, I get a abnomal kmer.