Dear Haoyu,
I would like to extract hifi reads that contribute to assembly of contigs in asm.dip.hap1.p_ctg.noseq.gfa and asm.dip.hap2.p_ctg.noseq.gfa.
However, to my surprise, less than two thirds of the total hifi reads (5,874,368) are present in the asm.dip.r_utg.gfa file. And only ~1/3 of the reads comprise each of the haplotype p_ctg.noseq.gfa file.
Is this expected?
Dear Haoyu, I would like to extract hifi reads that contribute to assembly of contigs in asm.dip.hap1.p_ctg.noseq.gfa and asm.dip.hap2.p_ctg.noseq.gfa. However, to my surprise, less than two thirds of the total hifi reads (5,874,368) are present in the asm.dip.r_utg.gfa file. And only ~1/3 of the reads comprise each of the haplotype p_ctg.noseq.gfa file. Is this expected?
See below: awk '$1=="A"' hifiasm.0.19.9.asm.dip.hap1.p_ctg.noseq.gfa |wc -l 3404927 awk '$1=="A"' hifiasm.0.19.9.asm.dip.hap2.p_ctg.noseq.gfa |wc -l 3310774
awk '$1=="A"' hifiasm.0.19.9.asm.dip.r_utg.gfa |wc -l 5874368
9082038 all.hifi.read.ids
Does this mean that 9082038-5874368 reads (which are around 1/3 of the reads) are not used by hifiasm to construct assembly?
Thanks very much, best, zhenzhen