chhylp123 / hifiasm

Hifiasm: a haplotype-resolved assembler for accurate Hifi reads
MIT License
547 stars 87 forks source link

a third of the hifi reads not present in gfa #706

Open zhenzhenyang-psu opened 1 month ago

zhenzhenyang-psu commented 1 month ago

Dear Haoyu, I would like to extract hifi reads that contribute to assembly of contigs in asm.dip.hap1.p_ctg.noseq.gfa and asm.dip.hap2.p_ctg.noseq.gfa. However, to my surprise, less than two thirds of the total hifi reads (5,874,368) are present in the asm.dip.r_utg.gfa file. And only ~1/3 of the reads comprise each of the haplotype p_ctg.noseq.gfa file. Is this expected?

See below: awk '$1=="A"' hifiasm.0.19.9.asm.dip.hap1.p_ctg.noseq.gfa |wc -l 3404927 awk '$1=="A"' hifiasm.0.19.9.asm.dip.hap2.p_ctg.noseq.gfa |wc -l 3310774

awk '$1=="A"' hifiasm.0.19.9.asm.dip.r_utg.gfa |wc -l 5874368

9082038 all.hifi.read.ids

Does this mean that 9082038-5874368 reads (which are around 1/3 of the reads) are not used by hifiasm to construct assembly?

Thanks very much, best, zhenzhen