chhylp123 / hifiasm

Hifiasm: a haplotype-resolved assembler for accurate Hifi reads
MIT License
523 stars 86 forks source link

Using Dovetail Chicago reads with Hi-C integration #364

Open jacquefm opened 1 year ago

jacquefm commented 1 year ago

Would it be possible to use Dovetail Chicago reads with the Hi-C integration option? Reference here: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4772016/

The Chicago reads are produced in a similar manner to Hi-C reads (proximity ligation of chromatin followed by library construction) except one would expect much longer insert sizes for the Chicago libraries (<500 kbp). Are there any parameters that would require tuning?

chhylp123 commented 1 year ago

It should work, but we haven't tested it yet. Could you please have a try?

jacquefm commented 1 year ago

I did give it a try. I had already run the Hi-C pipeline with Hi-C reads only. I just completed a brand new run using both the Hi-C and Chicago reads. The hic.lk.bin file is approx 2x larger for the run with Hi-C plus Chicago but the .hic.tlb.bin files are the same size for both runs. Is this expected? The assembly stats of the resulting haplotypes do differ between runs but the stats for the primary assembly is exactly the same.

stats.xlsx

chhylp123 commented 1 year ago

I think the results look fine. *hic.lk.bin keeps the Hi-C alignments, it will be larger as you have both Hi-C and Chicago reads during the second round. The differences in final contigs are acceptable to me, since Hi-C method is not as accurate as trio. If you want to do better benchmarking, it would be better to find some trio data as the ground truth.

jacquefm commented 1 year ago

Thank you. I appreciate you taking time to help with this.