The influence of depth to the assembly - Githubissues

chhylp123 / hifiasm

Hifiasm: a haplotype-resolved assembler for accurate Hifi reads

MIT License

547 stars 87 forks source link

The influence of depth to the assembly #578

Open xuxingyubio opened 11 months ago

xuxingyubio commented 11 months ago

How effective is hifiasm in assembling continuous regions with uneven coverage?

Moreover, how effective is hifiasm when inputting regions that are discontinuous and have uneven coverage?

chhylp123 commented 11 months ago

Hifiasm in general doesn’t have coverage-based assumptions. So as long as the reads coming from the regions with uneven coverage are long enough, hifiasm should be fine to assemble through them.

xuxingyubio commented 10 months ago

Does the dependency of purge_dups on read depth affect the assembly results? How should I go about deduplicating the assembly?

xuxingyubio commented 10 months ago

When assembling with hifi reads of uneven coverage, the results obtained contain many duplicates. Could this be because purge_dups is not suitable for removing duplicates?

chhylp123 commented 10 months ago

It is hard to say. But in most cases, uneven coverage means there might be some issues within the datasets.

xuxingyubio commented 10 months ago

When I assembled a relatively evenly covered diploid sequence, I found that the length of the assembled HP1 and HP2 exceeded expectations and was relatively fragmented. The Duplication ratio in the QUAST results was also high. However, when I set --n-hap 3, the length of the obtained prefix`.p_ctg.gfa and the Duplication ratio in the QUAST results were as expected. What could be the possible cause of this?

chhylp123 commented 10 months ago

Sorry for the late reply. How much larger for each haplotype is it? If the assembly is fragmented, it is common that it is a little bit larger since the boundary regions of different contigs might be represented multiple times. And why would you like to apply --n-hap 3 for a diploid genome?

xuxingyubio commented 10 months ago

The actual size should be around 4.4Mb, I set --hg-size 4.4m, but the assembled hap1, hap2 reached around 6.8Mb and there are 40-50 contigs. The reason I set --n-hap 3 is because I am considering whether there might be fragment duplication or chromosomal structural variation in this segment.

xuxingyubio commented 10 months ago

When I tried to assemble a normal sample using -n--hap 3, the resulting .bp.p_ctg.gfa was identical to the result obtained with -n--hap 2. So, what impact does the setting of --n-hap have on the result of .bp.p_ctg.gfa?