chhylp123 / hifiasm

Hifiasm: a haplotype-resolved assembler for accurate Hifi reads
MIT License
501 stars 84 forks source link

assembly genome is bigger than survey, error log #671

Open feiyu007 opened 1 week ago

feiyu007 commented 1 week ago

[M::adjust_utg_by_primary] primary contig coverage range: [59, infinity] Writing sd.bp.p_ctg.gfa to disk... [M::reduce_hamming_error_adv::2.146] # inserted edges: 3262, # fixed bubbles: 110 [M::adjust_utg_by_trio] primary contig coverage range: [59, infinity] [M::recall_arcs] # transitive arcs::950 [M::recall_arcs] # new arcs::359154, # old arcs::272606 ERROR ERROR ERROR [M::clean_trio_untig_graph] # adjusted arcs::10 [M::adjust_utg_by_trio] primary contig coverage range: [59, infinity] [M::recall_arcs] # transitive arcs::2054 [M::recall_arcs] # new arcs::371220, # old arcs::282894 ERROR-set_utg_offset [M::clean_trio_untig_graph] # adjusted arcs::18 [M::output_trio_graph_joint] dedup_base::54109272, miss_base::0

chhylp123 commented 6 days ago

It looks fine since ERROR is a warning in most cases. The final assembly size is always larger than the estimated size. This is because k-mer-based estimated size may underestimate repeats. But if the final assembly is too large, please see: https://hifiasm.readthedocs.io/en/latest/faq.html#why-the-size-of-primary-assembly-or-partially-phased-assembly-is-much-larger-than-the-estimated-genome-size.