Open BioOmics opened 5 months ago
also hope some one can answer this question. And explain the expected the hap size and the mixed (.hic.p_ctg.fa) size.
If you have read the hifiasm-UL paper, the imbalance haplotype size for HiC phasing is a known issue. Currently, hifiasm still cannot phase autopolyploid genomes with HiC. But if you have any other additional information (genetics map) or phasing by HapHiC or AllHiC, you could use -5
to reassign the haplotypes with more contiguous assembly. Even with HiFi and UL data, the phasing with HiC is still difficult for diplotigs and triplotigs for the tetraploid potato.
For the polyploid genome assembly, the main limitation of our current algorithm is that it requires genetic map information from progeny. To address this issue, we implemented an experimental single-sample approach using Hi-C phasing, and applied it to the autotetraploid potato dataset. This resulted in four haplotype assemblies, which have slightly worse phasing accuracy and contiguity in comparison to the genetic map-based assemblies. However, the four Hi-C phased haplotype assemblies are imbalanced, with one assembly being 20% larger than the others.
If you have read the hifiasm-UL paper, the imbalance haplotype size for HiC phasing is a known issue. Currently, hifiasm still cannot phase autopolyploid genomes with HiC. But if you have any other additional information (genetics map) or phasing by HapHiC or AllHiC, you could use
-5
to reassign the haplotypes with more contiguous assembly. Even with HiFi and UL data, the phasing with HiC is still difficult for diplotigs and triplotigs for the tetraploid potato.For the polyploid genome assembly, the main limitation of our current algorithm is that it requires genetic map information from progeny. To address this issue, we implemented an experimental single-sample approach using Hi-C phasing, and applied it to the autotetraploid potato dataset. This resulted in four haplotype assemblies, which have slightly worse phasing accuracy and contiguity in comparison to the genetic map-based assemblies. However, the four Hi-C phased haplotype assemblies are imbalanced, with one assembly being 20% larger than the others.
Hi, @baozg
Can I ask a question regarding this issue about tetraploid?
For a tetraploid, --n-hap
need to set to 4 or 2? Or it depends on if it's auto/allo polyploid?
Best regards Song
--n-hap
should set to the right ploidy level, the default assume a diploid. For a tetraploid, you should set --n-hap 4
, that's how I run for potato
Hi, @chhylp123
We are currently attempting to assemble a hexaploid using hifiasm with the following command:
The results are as follows:
Our questions
Why do we have 4 haplotypes (hapN.p_ctg) with a size of ~670MB (boxed in red) and 2 haplotypes with a size of ~520MB (boxed in green), while the p_ctg is ~320MB (boxed in blue)? Is this indicating that our hexaploid species has a ploidy composition of AAAABB? Furthermore, if we consider A to be roughly 670MB and B to be roughly 520MB, how can we account for the p_ctg size of approximately 320MB ?
By the way, here is an assessment of the species' genome size (~221MB) and heterozygosity (5.71%), which might be helpful for you to understand our query:
By the way, here is another result by using
l0
:If possible, how can we use the hifiasm command to achieve the best genome assembly, or basic primary genome assembly?
We are eagerly looking forward to your reply.