Open wangyzh opened 2 years ago
Could you please check the hom coverage threshold identified by hifiasm? The incorrect threshold could lead to this type of issue (see: https://hifiasm.readthedocs.io/en/latest/faq.html#for-hi-c-integrated-assembly-why-the-assembly-size-of-both-haplotypes-are-much-larger-than-the-estimated-genome-size). Besides, the k-mer plot is weird. It would be better to also check the input HiFi reads.
Could you please check the hom coverage threshold identified by hifiasm? The incorrect threshold could lead to this type of issue (see: https://hifiasm.readthedocs.io/en/latest/faq.html#for-hi-c-integrated-assembly-why-the-assembly-size-of-both-haplotypes-are-much-larger-than-the-estimated-genome-size). Besides, the k-mer plot is weird. It would be better to also check the input HiFi reads.
Thanks for your reply. As you mentioned, the error occurs because the input HiFi data is wrong. In the previous step, converting the bam file to fastq file, I used the bamToFastq (bedtools) which generates twice as many reads. So it is the reason why I got a weird k-mer plot. However, using the samtools bam2fq I got normal reads. Then I changed the input fq file, and got the correct result.
Hi,
Using hifiasm we got hap1.p_ctg 1001M, hap2.p_ctg 989M, which was larger than the genome size of our species(approximately 600M) . The contig number is 640 and 620, the contig N50 is ~8.8M. Besides, the k-mer seems strange. So, What might be the reason, could you give us some advice. Thanks.
[M::ha_analyze_count] lowest: count[5] = 123 [M::ha_analyze_count] highest: count[46] = 23685663 [M::ha_hist_line] 2: ****> 324240347 [M::ha_hist_line] 3: 6978 [M::ha_hist_line] 4: ***** 6400383 [M::ha_hist_line] 5: 123 [M::ha_hist_line] 6: ** 1403529 [M::ha_hist_line] 7: 22 [M::ha_hist_line] 8: * 616810 [M::ha_hist_line] 9: 10 [M::ha_hist_line] 10: 353469 [M::ha_hist_line] 11: 5 [M::ha_hist_line] 12: 277977 [M::ha_hist_line] 13: 5 [M::ha_hist_line] 14: 314808 [M::ha_hist_line] 15: 3 [M::ha_hist_line] 16: 440673 [M::ha_hist_line] 17: 8 [M::ha_hist_line] 18: 658102 [M::ha_hist_line] 19: 7 [M::ha_hist_line] 20: 970564 [M::ha_hist_line] 21: 11 [M::ha_hist_line] 22: * 1546523 [M::ha_hist_line] 23: 18 [M::ha_hist_line] 24: ** 2327776 [M::ha_hist_line] 25: 33 [M::ha_hist_line] 26: ** 3424127 [M::ha_hist_line] 27: 47 [M::ha_hist_line] 28: *** 4894080 [M::ha_hist_line] 29: 51 [M::ha_hist_line] 30: * 6751758 [M::ha_hist_line] 31: 79 [M::ha_hist_line] 32: ** 9017937 [M::ha_hist_line] 33: 91 [M::ha_hist_line] 34: ***** 11681240 [M::ha_hist_line] 35: 121 [M::ha_hist_line] 36: * 14537934 [M::ha_hist_line] 37: 161 [M::ha_hist_line] 38: ***** 17328129 [M::ha_hist_line] 39: 203 [M::ha_hist_line] 40: **** 20001003 [M::ha_hist_line] 41: 185 [M::ha_hist_line] 42: * 22066901 [M::ha_hist_line] 43: 201 [M::ha_hist_line] 44: ** 23282235 [M::ha_hist_line] 45: 249 [M::ha_hist_line] 46: **** 23685663 [M::ha_hist_line] rest: ****> 370183741 [M::ha_analyze_count] left: count[44] = 23282235 [M::ha_analyze_count] right: count[48] = 23141891 [M::ha_ft_gen] peak_hom: 48; peak_het: 46 [M::ha_ct_shrink::4543.5826.08] ==> counted 3479831 distinct minimizer k-mers [M::ha_ft_gen::4551.5556.07@32.885GB] ==> filtered out 3479831 k-mers occurring 240 or more times [M::ha_opt_update_cov] updated max_n_chain to 240 [M::yak_count] collected 1548983080 minimizers [M::ha_pt_gen::6018.976*6.62] ==> counted 39678480 distinct minimizer k-mers [M::ha_pt_gen] count[4095] = 0 (for sanity check) [M::ha_analyze_count] lowest: count[5] = 0 [M::ha_analyze_count] highest: count[46] = 1063398 [M::ha_hist_line] 2: ****> 15934769 [M::ha_hist_line] 3: 0 [M::ha_hist_line] 4: **** 595914 [M::ha_hist_line] 5: 0 [M::ha_hist_line] 6: ** 148144 [M::ha_hist_line] 7: 0 [M::ha_hist_line] 8: ** 66810 [M::ha_hist_line] 9: 0 [M::ha_hist_line] 10: ** 40578 [M::ha_hist_line] 11: 0 [M::ha_hist_line] 12: ** 29458 [M::ha_hist_line] 13: 0 [M::ha_hist_line] 14: 27744 [M::ha_hist_line] 15: 0 [M::ha_hist_line] 16: * 31325 [M::ha_hist_line] 17: 0 [M::ha_hist_line] 18: * 40256 [M::ha_hist_line] 19: 0 [M::ha_hist_line] 20: 54770 [M::ha_hist_line] 21: 0 [M::ha_hist_line] 22: **** 80860 [M::ha_hist_line] 23: 0 [M::ha_hist_line] 24: * 116854 [M::ha_hist_line] 25: 0 [M::ha_hist_line] 26: **** 167790 [M::ha_hist_line] 27: 0 [M::ha_hist_line] 28: ** 234606 [M::ha_hist_line] 29: 0 [M::ha_hist_line] 30: ** 320133 [M::ha_hist_line] 31: 0 [M::ha_hist_line] 32: **** 423429 [M::ha_hist_line] 33: 0 [M::ha_hist_line] 34: ***** 544521 [M::ha_hist_line] 35: 0 [M::ha_hist_line] 36: * 672318 [M::ha_hist_line] 37: 0 [M::ha_hist_line] 38: ***** 795892 [M::ha_hist_line] 39: 0 [M::ha_hist_line] 40: ** 913044 [M::ha_hist_line] 41: 0 [M::ha_hist_line] 42: ** 1002193 [M::ha_hist_line] 43: 0 [M::ha_hist_line] 44: ***** 1051412 [M::ha_hist_line] 45: 0 [M::ha_hist_line] 46: **** 1063398 [M::ha_hist_line] rest: ****> 15322262 [M::ha_analyze_count] left: count[44] = 1051412 [M::ha_analyze_count] right: count[48] = 1032368 [M::ha_pt_gen] peak_hom: 48; peak_het: 46 [M::ha_ct_shrink::6019.0856.62] ==> counted 39678480 distinct minimizer k-mers [M::ha_pt_gen::] counting in normal mode [M::yak_count] collected 1548983080 minimizers [M::ha_pt_gen::6834.6137.62] ==> indexed 1548983080 positions, counted 39678480 distinct minimizer k-mers [M::ha_assemble::25017.12146.34@57.031GB] ==> corrected reads for round 1 [M::ha_assemble] # bases: 59472389822; # corrected bases: 131958179; # recorrected bases: 137592 [M::ha_assemble] size of buffer: 23.742GB [M::yak_count] collected 1544204218 minimizers [M::ha_pt_gen::25221.37046.33] ==> counted 25340871 distinct minimizer k-mers [M::ha_pt_gen] count[4095] = 0 (for sanity check) [M::ha_analyze_count] lowest: count[5] = 7 [M::ha_analyze_count] highest: count[46] = 1055569 [M::ha_hist_line] 2: ****> 2379116 [M::ha_hist_line] 3: 0 [M::ha_hist_line] 4: * 71601 [M::ha_hist_line] 5: 7 [M::ha_hist_line] 6: * 27105 [M::ha_hist_line] 7: 3 [M::ha_hist_line] 8: 17725 [M::ha_hist_line] 9: 0 [M::ha_hist_line] 10: 13581 [M::ha_hist_line] 11: 0 [M::ha_hist_line] 12: 12635 [M::ha_hist_line] 13: 2 [M::ha_hist_line] 14: 16091 [M::ha_hist_line] 15: 4 [M::ha_hist_line] 16: * 22301 [M::ha_hist_line] 17: 2 [M::ha_hist_line] 18: 31677 [M::ha_hist_line] 19: 0 [M::ha_hist_line] 20: 44561 [M::ha_hist_line] 21: 0 [M::ha_hist_line] 22: ** 68042 [M::ha_hist_line] 23: 0 [M::ha_hist_line] 24: * 99661 [M::ha_hist_line] 25: 0 [M::ha_hist_line] 26: ** 144474 [M::ha_hist_line] 27: 0 [M::ha_hist_line] 28: *** 204502 [M::ha_hist_line] 29: 0 [M::ha_hist_line] 30: * 281348 [M::ha_hist_line] 31: 0 [M::ha_hist_line] 32: **** 377461 [M::ha_hist_line] 33: 0 [M::ha_hist_line] 34: ** 489619 [M::ha_hist_line] 35: 0 [M::ha_hist_line] 36: ** 612689 [M::ha_hist_line] 37: 0 [M::ha_hist_line] 38: ** 738974 [M::ha_hist_line] 39: 0 [M::ha_hist_line] 40: ** 860743 [M::ha_hist_line] 41: 0 [M::ha_hist_line] 42: ***** 963096 [M::ha_hist_line] 43: 0 [M::ha_hist_line] 44: *** 1024048 [M::ha_hist_line] 45: 1 [M::ha_hist_line] 46: **** 1055569 [M::ha_hist_line] rest: ****> 15784233 [M::ha_analyze_count] left: count[44] = 1024048 [M::ha_analyze_count] right: count[48] = 1041507 [M::ha_pt_gen] peak_hom: 48; peak_het: 46 [M::ha_ct_shrink::25221.47846.33] ==> counted 25340871 distinct minimizer k-mers [M::ha_pt_gen::] counting in normal mode [M::yak_count] collected 1544204218 minimizers [M::ha_pt_gen::25488.54746.27] ==> indexed 1544204218 positions, counted 25340871 distinct minimizer k-mers [M::ha_assemble::40643.99452.83@70.834GB] ==> corrected reads for round 2 [M::ha_assemble] # bases: 59424295326; # corrected bases: 11920530; # recorrected bases: 19550 [M::ha_assemble] size of buffer: 23.337GB [M::yak_count] collected 1543468066 minimizers [M::ha_pt_gen::40848.00652.79] ==> counted 23525785 distinct minimizer k-mers [M::ha_pt_gen] count[4095] = 0 (for sanity check) [M::ha_analyze_count] lowest: count[5] = 0 [M::ha_analyze_count] highest: count[46] = 1055197 [M::ha_hist_line] 2: * 620945 [M::ha_hist_line] 3: 0 [M::ha_hist_line] 4: ** 38671 [M::ha_hist_line] 5: 0 [M::ha_hist_line] 6: 18597 [M::ha_hist_line] 7: 0 [M::ha_hist_line] 8: 12897 [M::ha_hist_line] 9: 0 [M::ha_hist_line] 10: 10611 [M::ha_hist_line] 11: 0 [M::ha_hist_line] 12: 11234 [M::ha_hist_line] 13: 0 [M::ha_hist_line] 14: 14266 [M::ha_hist_line] 15: 0 [M::ha_hist_line] 16: 21166 [M::ha_hist_line] 17: 0 [M::ha_hist_line] 18: * 30503 [M::ha_hist_line] 19: 0 [M::ha_hist_line] 20: 43138 [M::ha_hist_line] 21: 0 [M::ha_hist_line] 22: ** 66019 [M::ha_hist_line] 23: 0 [M::ha_hist_line] 24: ***** 97450 [M::ha_hist_line] 25: 0 [M::ha_hist_line] 26: * 142374 [M::ha_hist_line] 27: 0 [M::ha_hist_line] 28: *** 201132 [M::ha_hist_line] 29: 0 [M::ha_hist_line] 30: ** 277179 [M::ha_hist_line] 31: 0 [M::ha_hist_line] 32: * 372825 [M::ha_hist_line] 33: 0 [M::ha_hist_line] 34: ** 484074 [M::ha_hist_line] 35: 0 [M::ha_hist_line] 36: ** 606905 [M::ha_hist_line] 37: 0 [M::ha_hist_line] 38: *** 731102 [M::ha_hist_line] 39: 0 [M::ha_hist_line] 40: * 855739 [M::ha_hist_line] 41: 0 [M::ha_hist_line] 42: *** 958855 [M::ha_hist_line] 43: 0 [M::ha_hist_line] 44: *** 1020064 [M::ha_hist_line] 45: 0 [M::ha_hist_line] 46: **** 1055197 [M::ha_hist_line] rest: ****> 15834842 [M::ha_analyze_count] left: count[44] = 1020064 [M::ha_analyze_count] right: count[48] = 1041833 [M::ha_pt_gen] peak_hom: 48; peak_het: 46 [M::ha_ct_shrink::40848.07852.79] ==> counted 23525785 distinct minimizer k-mers [M::ha_pt_gen::] counting in normal mode [M::yak_count] collected 1543468066 minimizers [M::ha_pt_gen::41122.39452.70] ==> indexed 1543468066 positions, counted 23525785 distinct minimizer k-mers [M::ha_assemble::56044.87255.68@93.926GB] ==> corrected reads for round 3 [M::ha_assemble] # bases: 59419408164; # corrected bases: 1149103; # recorrected bases: 22400 [M::ha_assemble] size of buffer: 23.074GB [M::yak_count] collected 1543304300 minimizers [M::ha_pt_gen::56246.25555.65] ==> counted 23350698 distinct minimizer k-mers [M::ha_pt_gen] count[4095] = 0 (for sanity check) [M::ha_analyze_count] lowest: count[5] = 0 [M::ha_analyze_count] highest: count[46] = 1055324 [M::ha_hist_line] 2: **** 459162 [M::ha_hist_line] 3: 0 [M::ha_hist_line] 4: * 32874 [M::ha_hist_line] 5: 0 [M::ha_hist_line] 6: 15495 [M::ha_hist_line] 7: 0 [M::ha_hist_line] 8: 11612 [M::ha_hist_line] 9: 0 [M::ha_hist_line] 10: 9861 [M::ha_hist_line] 11: 0 [M::ha_hist_line] 12: 10689 [M::ha_hist_line] 13: 0 [M::ha_hist_line] 14: 13802 [M::ha_hist_line] 15: 0 [M::ha_hist_line] 16: 20791 [M::ha_hist_line] 17: 0 [M::ha_hist_line] 18: 30262 [M::ha_hist_line] 19: 0 [M::ha_hist_line] 20: 42918 [M::ha_hist_line] 21: 0 [M::ha_hist_line] 22: ** 65799 [M::ha_hist_line] 23: 0 [M::ha_hist_line] 24: * 97200 [M::ha_hist_line] 25: 0 [M::ha_hist_line] 26: ***** 141892 [M::ha_hist_line] 27: 0 [M::ha_hist_line] 28: * 200889 [M::ha_hist_line] 29: 0 [M::ha_hist_line] 30: ** 276700 [M::ha_hist_line] 31: 0 [M::ha_hist_line] 32: ***** 372521 [M::ha_hist_line] 33: 0 [M::ha_hist_line] 34: ** 483592 [M::ha_hist_line] 35: 0 [M::ha_hist_line] 36: * 606079 [M::ha_hist_line] 37: 0 [M::ha_hist_line] 38: ***** 730525 [M::ha_hist_line] 39: 0 [M::ha_hist_line] 40: * 855536 [M::ha_hist_line] 41: 0 [M::ha_hist_line] 42: *** 958647 [M::ha_hist_line] 43: 0 [M::ha_hist_line] 44: ***** 1019830 [M::ha_hist_line] 45: 0 [M::ha_hist_line] 46: **** 1055324 [M::ha_hist_line] rest: ****> 15838698 [M::ha_analyze_count] left: count[44] = 1019830 [M::ha_analyze_count] right: count[48] = 1042039 [M::ha_pt_gen] peak_hom: 48; peak_het: 46 [M::ha_ct_shrink::56246.32855.65] ==> counted 23350698 distinct minimizer k-mers [M::ha_pt_gen::] counting in normal mode [M::yak_count] collected 1543304300 minimizers [M::ha_pt_gen::56560.38755.53] ==> indexed 1543304300 positions, counted 23350698 distinct minimizer k-mers [M::ha_assemble::58904.181*55.86@98.209GB] ==> found overlaps for the final round [M::ha_print_ovlp_stat] # overlaps: 354351228 [M::ha_print_ovlp_stat] # strong overlaps: 211026242 [M::ha_print_ovlp_stat] # weak overlaps: 143324986 [M::ha_print_ovlp_stat] # exact overlaps: 345244238 [M::ha_print_ovlp_stat] # inexact overlaps: 9106990 [M::ha_print_ovlp_stat] # overlaps without large indels: 353289064 [M::ha_print_ovlp_stat] # reverse overlaps: 91930232 Writing reads to disk... Reads has been written. Writing ma_hit_ts to disk... ma_hit_ts has been written. Writing ma_hit_ts to disk... ma_hit_ts has been written. bin files have been written. [M::purge_dups] homozygous read coverage threshold: 48 [M::purge_dups] purge duplication coverage threshold: 60 Writing raw unitig GFA to disk... Writing processed unitig GFA to disk... [M::purge_dups] homozygous read coverage threshold: 48 [M::purge_dups] purge duplication coverage threshold: 60 [M::mc_solve_core::0.087] ==> Partition [M::adjust_utg_by_primary] primary contig coverage range: [40, infinity] Writing hifi.asm.bp.p_ctg.gfa to disk... [M::adjust_utg_by_trio] primary contig coverage range: [40, infinity] Writing hifi.asm.bp.hap1.p_ctg.gfa to disk... [M::adjust_utg_by_trio] primary contig coverage range: [40, infinity] Writing hifi.asm.bp.hap2.p_ctg.gfa to disk... Inconsistency threshold for low-quality regions in BED files: 70% [M::main] Version: 0.16.1-r375 [M::main] CMD: hifiasm -o hifi.asm -t 64 ../../data/b31.hifi.fq.gz [M::main] Real time: 61162.954 sec; CPU: 3293262.765 sec; Peak RSS: 98.209 GB