Open xuxingyubio opened 11 months ago
Hifiasm in general doesn’t have coverage-based assumptions. So as long as the reads coming from the regions with uneven coverage are long enough, hifiasm should be fine to assemble through them.
Does the dependency of purge_dups on read depth affect the assembly results? How should I go about deduplicating the assembly?
When assembling with hifi reads of uneven coverage, the results obtained contain many duplicates. Could this be because purge_dups is not suitable for removing duplicates?
It is hard to say. But in most cases, uneven coverage means there might be some issues within the datasets.
When I assembled a relatively evenly covered diploid sequence, I found that the length of the assembled HP1 and HP2 exceeded expectations and was relatively fragmented. The Duplication ratio in the QUAST results was also high. However, when I set --n-hap 3, the length of the obtained prefix`.p_ctg.gfa and the Duplication ratio in the QUAST results were as expected. What could be the possible cause of this?
Sorry for the late reply. How much larger for each haplotype is it? If the assembly is fragmented, it is common that it is a little bit larger since the boundary regions of different contigs might be represented multiple times. And why would you like to apply --n-hap 3 for a diploid genome?
The actual size should be around 4.4Mb, I set --hg-size 4.4m, but the assembled hap1, hap2 reached around 6.8Mb and there are 40-50 contigs. The reason I set --n-hap 3 is because I am considering whether there might be fragment duplication or chromosomal structural variation in this segment.
When I tried to assemble a normal sample using -n--hap 3, the resulting .bp.p_ctg.gfa was identical to the result obtained with -n--hap 2. So, what impact does the setting of --n-hap have on the result of .bp.p_ctg.gfa?
How effective is hifiasm in assembling continuous regions with uneven coverage?
Moreover, how effective is hifiasm when inputting regions that are discontinuous and have uneven coverage?