chhylp123 / hifiasm

Hifiasm: a haplotype-resolved assembler for accurate Hifi reads
MIT License
532 stars 87 forks source link

Accuracy in regions with only ONT support? #502

Open kevfengler227 opened 1 year ago

kevfengler227 commented 1 year ago

One of the key benefits of adding ONT to a PacBio HiFi assembly is to fill in the all the gaps arising from GA(n) coverage drop outs. Thus, in these regions there will only be ONT coverage. How does hifiasm generate the consensus sequence in this region given that it does not have an explicit consensus step? Is a single ONT read being used to span this region? Just want to confirm.

Below are the 25x HiFi reads (top) and 45x ONT (bottom) data aligned to the assembly with a GA(n) repeat region being shown.

image

chhylp123 commented 1 year ago

Right now hifiasm select one ONT read to fill the gaps. Do you think is important to polish gaps with ONT reads?

baozg commented 1 year ago

Could be possible to output these regions in bed (like lowQ.bed) for additional polish? General variant callers may have more sophisticated steps for this polish.

kevfengler227 commented 1 year ago

OK, thanks. I just wanted to confirm the behavior. Yes, I currently include an additional polishing step with the HiFi reads to correct SNP/INDEL errors which typically yields ~100-500 changed bases, more if a bad read was included in the tiling path. A polishing step with ONT could be done too to help with these regions lacking HiFi, but also for phasing variants within tandem repeats. A ONT.lowQ.bed file would be helpful.