chhylp123 / hifiasm

Hifiasm: a haplotype-resolved assembler for accurate Hifi reads
MIT License
547 stars 87 forks source link

strange assembled contig using hifi+UL+hic #452

Open HippoYI opened 1 year ago

HippoYI commented 1 year ago

Hi there! I am using the latest version(0.19.5) to assemble my genome using ~100X hifi reads, ~30X UL reads, and ~100X hic reads. When comparing to the assembled version using 0.16.1, it did improved a lot in contigs N50 and has less misjoined regions. However, I got one huge contig (it also happened using 0.16.1) with little HiC contact(the top left contig with blue '***' ). After using TRF, I did found some regions with centromere monomer sequences(32bp~96bp), but I think the assembled putative centromere related sequence was way too long(~53M). Now I don't know how to solve this issue, can you give me some hint? Thanks so much! hic_contact strange_ctg

chhylp123 commented 1 year ago

If possible, would you mind to show the assembly graph? By looking at the graph, it should be easier to figure out if it is a misassembly.

HippoYI commented 1 year ago

Thanks for the quick reply. I have uploaded three graphs related to three gfa results( r_utg, p_utg and p_ctg), is that OK for you to investigate? If you need more data or information, please let me know. Thanks! p_ctg p_utg r_utg.pdf

chhylp123 commented 1 year ago

The high-level graph looks ok. You could look at the corresponding nodes of this problematical region within the graphs. Graph will give you a sense if it is a misjoin.

HippoYI commented 1 year ago

Actually the longest one is the problematical contig. As you can see, the longest one has so many nodes, how to look at the corresponding nodes within the graph? extract the sequence of the contigs in each node to see if they can overlap well?

chhylp123 commented 1 year ago

Sorry for the late reply. In generally, you could pickup the readID within the A-line, and grep them in the assembly graph.

HippoYI commented 1 year ago

got it, thanks.