chhylp123 / hifiasm

Hifiasm: a haplotype-resolved assembler for accurate Hifi reads
MIT License
543 stars 87 forks source link

Assembly not mapping to the reference #365

Open dylandebaun opened 1 year ago

dylandebaun commented 1 year ago

Hello, I'm new to de novo assembly, I ran the command with no additional flags: ./hifiasm -o snake -t 32 reads1.fastq.gz reads2.fastq.gz reads3.fastq.gz

The assembly stats were great, exactly as expected in terms of genome length: "L10": 1, "L20": 3, "L30": 5, "L40": 7, "L50": 11, "N10": 116998146, "N20": 97897063, "N30": 78098521, "N40": 56446441, "N50": 52050959, "gc_content": 40.57200655513952, "longest": 121667207, "mean": 1942364.6616216216, "median": 88281.0, "sequence_count": 925, "shortest": 12052, "total_bps": 1796687312 Busco determine was 90-100% completeness, however when using minimap or numcer to align to the reference genome, I get a highly discordant alignment and it seems to only map to half my genome. (note: 10x genomes from this clade map to the reference well). x=reference; y=hifiasm assembly

image

Here is the output file: [M::ha_analyze_count] lowest: count[5] = 4392984 [M::ha_analyze_count] highest: count[20] = 44022530 [M::ha_hist_line] 2: ** 16587672 [M::ha_hist_line] 3: ** 5044960 [M::ha_hist_line] 4: 4083271 [M::ha_hist_line] 5: ** 4392984 [M::ha_hist_line] 6: **** 5151770 [M::ha_hist_line] 7: * 6400704 [M::ha_hist_line] 8: ** 7956081 [M::ha_hist_line] 9: ** 9827601 [M::ha_hist_line] 10: **** 12170900 [M::ha_hist_line] 11: ** 15049814 [M::ha_hist_line] 12: ** 18588202 [M::ha_hist_line] 13: ***** 22528992 [M::ha_hist_line] 14: * 26734299 [M::ha_hist_line] 15: ** 30825587 [M::ha_hist_line] 16: *** 34841102 [M::ha_hist_line] 17: * 38499757 [M::ha_hist_line] 18: ** 41299181 [M::ha_hist_line] 19: ** 43150315 [M::ha_hist_line] 20: **** 44022530 [M::ha_hist_line] 21: ***** 43717794 [M::ha_hist_line] 22: **** 42445483 [M::ha_hist_line] 23: **** 40329953 [M::ha_hist_line] 24: ** 37755365 [M::ha_hist_line] 25: ***** 34627555 [M::ha_hist_line] 26: * 31439311 [M::ha_hist_line] 27: *** 28498973 [M::ha_hist_line] 28: ** 25741779 [M::ha_hist_line] 29: ** 23674386 [M::ha_hist_line] 30: ** 22093429 [M::ha_hist_line] 31: **** 21083431 [M::ha_hist_line] 32: * 20562405 [M::ha_hist_line] 33: ** 20419109 [M::ha_hist_line] 34: * 20658399 [M::ha_hist_line] 35: **** 21072249 [M::ha_hist_line] 36: *** 21702594 [M::ha_hist_line] 37: * 22439498 [M::ha_hist_line] 38: *** 23116858 [M::ha_hist_line] 39: ** 23685829 [M::ha_hist_line] 40: 24077438 [M::ha_hist_line] 41: 24072725 [M::ha_hist_line] 42: ** 23967255 [M::ha_hist_line] 43: ** 23680234 [M::ha_hist_line] 44: **** 23099273 [M::ha_hist_line] 45: *** 22325035 [M::ha_hist_line] 46: **** 21234637 [M::ha_hist_line] 47: * 20023416 [M::ha_hist_line] 48: ** 18623381 [M::ha_hist_line] 49: **** 17115088 [M::ha_hist_line] 50: 15585146 [M::ha_hist_line] 51: **** 14059680 [M::ha_hist_line] 52: **** 12528858 [M::ha_hist_line] 53: *** 11071515 [M::ha_hist_line] 54: ** 9710840 [M::ha_hist_line] 55: * 8395159 [M::ha_hist_line] 56: **** 7182231 [M::ha_hist_line] 57: ** 6128506 [M::ha_hist_line] 58: **** 5157541 [M::ha_hist_line] 59: ** 4320555 [M::ha_hist_line] 60: **** 3610435 [M::ha_hist_line] 61: * 2997752 [M::ha_hist_line] 62: **** 2475374 [M::ha_hist_line] 63: * 2036591 [M::ha_hist_line] 64: ** 1673687 [M::ha_hist_line] 65: ** 1388799 [M::ha_hist_line] 66: 1151180 [M::ha_hist_line] 67: 968837 [M::ha_hist_line] 68: 827464 [M::ha_hist_line] 69: 702100 [M::ha_hist_line] 70: 611166 [M::ha_hist_line] 71: 545213 [M::ha_hist_line] 72: 485170 [M::ha_hist_line] 73: 443151 [M::ha_hist_line] 74: 409261 [M::ha_hist_line] 75: 380014 [M::ha_hist_line] 76: 349749 [M::ha_hist_line] 77: 329988 [M::ha_hist_line] 78: 317037 [M::ha_hist_line] 79: 307623 [M::ha_hist_line] 80: 292278 [M::ha_hist_line] 81: 280643 [M::ha_hist_line] 82: 270007 [M::ha_hist_line] 83: 264316 [M::ha_hist_line] 84: 258797 [M::ha_hist_line] 85: 248489 [M::ha_hist_line] 86: 238096 [M::ha_hist_line] 87: 235057 [M::ha_hist_line] 88: * 226305 [M::ha_hist_line] rest: *** 14350742 [M::ha_analyze_count] left: none [M::ha_analyze_count] right: count[40] = 24077438 [M::ha_ft_gen] peak_hom: 40; peak_het: 20 [M::ha_ct_shrink::1002.3059.12] ==> counted 5366593 distinct minimizer k-mers [M::ha_ft_gen::1006.2349.09@33.116GB] ==> filtered out 5366593 k-mers occurring 200 or more times [M::ha_opt_update_cov] updated max_n_chain to 200 [M::yak_count] collected 580440966 minimizers [M::yak_count] collected 646635008 minimizers [M::yak_count] collected 601712161 minimizers [M::ha_pt_gen::1892.883*7.26] ==> counted 96430300 distinct minimizer k-mers [M::ha_pt_gen] count[4095] = 0 (for sanity check) [M::ha_analyze_count] lowest: count[5] = 282931 [M::ha_analyze_count] highest: count[20] = 2097837 [M::ha_hist_line] 1: ****> 37344881 [M::ha_hist_line] 2: ** 1466273 [M::ha_hist_line] 3: ***** 444509 [M::ha_hist_line] 4: ** 300906 [M::ha_hist_line] 5: * 282931 [M::ha_hist_line] 6: * 305804 [M::ha_hist_line] 7: *** 359230 [M::ha_hist_line] 8: * 431980 [M::ha_hist_line] 9: ***** 516873 [M::ha_hist_line] 10: ** 630700 [M::ha_hist_line] 11: * 770461 [M::ha_hist_line] 12: ***** 935788 [M::ha_hist_line] 13: ** 1124402 [M::ha_hist_line] 14: * 1319665 [M::ha_hist_line] 15: **** 1511500 [M::ha_hist_line] 16: *** 1697161 [M::ha_hist_line] 17: * 1866127 [M::ha_hist_line] 18: *** 1990949 [M::ha_hist_line] 19: * 2068314 [M::ha_hist_line] 20: **** 2097837 [M::ha_hist_line] 21: * 2074828 [M::ha_hist_line] 22: **** 2005644 [M::ha_hist_line] 23: ** 1897804 [M::ha_hist_line] 24: **** 1765431 [M::ha_hist_line] 25: ***** 1607786 [M::ha_hist_line] 26: * 1452078 [M::ha_hist_line] 27: ** 1305603 [M::ha_hist_line] 28: **** 1170118 [M::ha_hist_line] 29: **** 1066870 [M::ha_hist_line] 30: 985630 [M::ha_hist_line] 31: **** 930393 [M::ha_hist_line] 32: * 895961 [M::ha_hist_line] 33: ** 879807 [M::ha_hist_line] 34: ** 881549 [M::ha_hist_line] 35: ** 891143 [M::ha_hist_line] 36: * 910691 [M::ha_hist_line] 37: * 934583 [M::ha_hist_line] 38: ** 960302 [M::ha_hist_line] 39: *** 976759 [M::ha_hist_line] 40: 987677 [M::ha_hist_line] 41: 984428 [M::ha_hist_line] 42: * 976633 [M::ha_hist_line] 43: **** 960893 [M::ha_hist_line] 44: * 938039 [M::ha_hist_line] 45: ** 904074 [M::ha_hist_line] 46: 858123 [M::ha_hist_line] 47: *** 809955 [M::ha_hist_line] 48: **** 752538 [M::ha_hist_line] 49: * 693492 [M::ha_hist_line] 50: ** 632442 [M::ha_hist_line] 51: ***** 572579 [M::ha_hist_line] 52: **** 510748 [M::ha_hist_line] 53: ** 454447 [M::ha_hist_line] 54: * 400189 [M::ha_hist_line] 55: * 348914 [M::ha_hist_line] 56: ** 302158 [M::ha_hist_line] 57: * 262619 [M::ha_hist_line] 58: ** 223722 [M::ha_hist_line] 59: 191990 [M::ha_hist_line] 60: **** 164664 [M::ha_hist_line] 61: * 140797 [M::ha_hist_line] 62: **** 120155 [M::ha_hist_line] 63: * 104074 [M::ha_hist_line] 64: 89453 [M::ha_hist_line] 65: 78962 [M::ha_hist_line] 66: 69749 [M::ha_hist_line] 67: 62028 [M::ha_hist_line] 68: * 56561 [M::ha_hist_line] 69: 51644 [M::ha_hist_line] 70: 47359 [M::ha_hist_line] 71: 44602 [M::ha_hist_line] 72: 41911 [M::ha_hist_line] 73: 39674 [M::ha_hist_line] 74: 37687 [M::ha_hist_line] 75: 36282 [M::ha_hist_line] 76: 34383 [M::ha_hist_line] 77: 32817 [M::ha_hist_line] 78: 32136 [M::ha_hist_line] 79: * 31491 [M::ha_hist_line] 80: 30217 [M::ha_hist_line] 81: 29268 [M::ha_hist_line] 82: 28296 [M::ha_hist_line] 83: 27635 [M::ha_hist_line] 84: 26949 [M::ha_hist_line] 85: 26475 [M::ha_hist_line] 86: 25510 [M::ha_hist_line] 87: 25299 [M::ha_hist_line] 88: 24455 [M::ha_hist_line] 89: 24115 [M::ha_hist_line] 90: 23683 [M::ha_hist_line] 91: 22591 [M::ha_hist_line] 92: 22341 [M::ha_hist_line] 93: 21904 [M::ha_hist_line] 94: 21475 [M::ha_hist_line] 95: 21108 [M::ha_hist_line] 96: 20402 [M::ha_hist_line] 97: 20301 [M::ha_hist_line] 98: 19561 [M::ha_hist_line] 99: 18995 [M::ha_hist_line] 100: 18294 [M::ha_hist_line] 101: 18121 [M::ha_hist_line] 102: 17473 [M::ha_hist_line] 103: 17230 [M::ha_hist_line] 104: 16872 [M::ha_hist_line] 105: 16353 [M::ha_hist_line] 106: 16125 [M::ha_hist_line] 107: 15585 [M::ha_hist_line] 108: 14971 [M::ha_hist_line] 109: 14803 [M::ha_hist_line] 110: 14742 [M::ha_hist_line] 111: 14130 [M::ha_hist_line] 112: 13949 [M::ha_hist_line] 113: 13475 [M::ha_hist_line] 114: 13281 [M::ha_hist_line] 115: 12979 [M::ha_hist_line] 116: 12787 [M::ha_hist_line] 117: 12437 [M::ha_hist_line] 118: 12099 [M::ha_hist_line] 119: 11637 [M::ha_hist_line] 120: 11535 [M::ha_hist_line] 121: 11307 [M::ha_hist_line] 122: 11018 [M::ha_hist_line] 123: 10954 [M::ha_hist_line] 124: 10879 [M::ha_hist_line] rest: **** 455324 [M::ha_analyze_count] left: none [M::ha_analyze_count] right: count[40] = 987677 [M::ha_pt_gen] peak_hom: 40; peak_het: 20 [M::ha_ct_shrink::1892.9877.26] ==> counted 59085419 distinct minimizer k-mers [M::ha_pt_gen::] counting in normal mode [M::yak_count] collected 1828788135 minimizers [M::yak_count] collected 0 minimizers [M::yak_count] collected 0 minimizers [M::ha_pt_gen::2198.9118.18] ==> indexed 1791443254 positions, counted 59085419 distinct minimizer k-mers [M::ha_assemble::7847.33224.44@56.296GB] ==> corrected reads for round 1 [M::ha_assemble] # bases: 66475102729; # corrected bases: 163608680; # recorrected bases: 192398 [M::ha_assemble] size of buffer: 18.364GB [M::yak_count] collected 1822126911 minimizers [M::yak_count] collected 0 minimizers [M::yak_count] collected 0 minimizers [M::ha_pt_gen::8099.85724.15] ==> counted 62498334 distinct minimizer k-mers [M::ha_pt_gen] count[4095] = 0 (for sanity check) [M::ha_analyze_count] lowest: count[5] = 210531 [M::ha_analyze_count] highest: count[20] = 2071926 [M::ha_hist_line] 1: ****> 5145767 [M::ha_hist_line] 2: **** 321795 [M::ha_hist_line] 3: **** 168930 [M::ha_hist_line] 4: * 179296 [M::ha_hist_line] 5: ** 210531 [M::ha_hist_line] 6: **** 253848 [M::ha_hist_line] 7: *** 313745 [M::ha_hist_line] 8: * 386569 [M::ha_hist_line] 9: ***** 469814 [M::ha_hist_line] 10: **** 579141 [M::ha_hist_line] 11: ** 707532 [M::ha_hist_line] 12: ** 864616 [M::ha_hist_line] 13: ** 1044781 [M::ha_hist_line] 14: **** 1237988 [M::ha_hist_line] 15: * 1428649 [M::ha_hist_line] 16: ** 1614088 [M::ha_hist_line] 17: ** 1789771 [M::ha_hist_line] 18: ***** 1924992 [M::ha_hist_line] 19: ** 2023115 [M::ha_hist_line] 20: **** 2071926 [M::ha_hist_line] 21: **** 2067783 [M::ha_hist_line] 22: **** 2016016 [M::ha_hist_line] 23: 1922944 [M::ha_hist_line] 24: *** 1805235 [M::ha_hist_line] 25: **** 1652925 [M::ha_hist_line] 26: **** 1501604 [M::ha_hist_line] 27: * 1351895 [M::ha_hist_line] 28: ** 1207493 [M::ha_hist_line] 29: ***** 1095410 [M::ha_hist_line] 30: **** 1002155 [M::ha_hist_line] 31: * 932307 [M::ha_hist_line] 32: * 893269 [M::ha_hist_line] 33: ** 866390 [M::ha_hist_line] 34: ** 862399 [M::ha_hist_line] 35: ** 865081 [M::ha_hist_line] 36: * 883936 [M::ha_hist_line] 37: **** 906873 [M::ha_hist_line] 38: * 930177 [M::ha_hist_line] 39: ** 953188 [M::ha_hist_line] 40: *** 971968 [M::ha_hist_line] 41: 973417 [M::ha_hist_line] 42: 971837 [M::ha_hist_line] 43: * 964951 [M::ha_hist_line] 44: **** 945698 [M::ha_hist_line] 45: * 923199 [M::ha_hist_line] 46: ** 884912 [M::ha_hist_line] 47: 839276 [M::ha_hist_line] 48: ** 790469 [M::ha_hist_line] 49: ** 732313 [M::ha_hist_line] 50: 675445 [M::ha_hist_line] 51: ** 615614 [M::ha_hist_line] 52: *** 553268 [M::ha_hist_line] 53: **** 495232 [M::ha_hist_line] 54: * 444231 [M::ha_hist_line] 55: *** 389386 [M::ha_hist_line] 56: **** 339480 [M::ha_hist_line] 57: ** 293941 [M::ha_hist_line] 58: **** 254162 [M::ha_hist_line] 59: ** 217158 [M::ha_hist_line] 60: * 187505 [M::ha_hist_line] 61: **** 160266 [M::ha_hist_line] 62: * 138000 [M::ha_hist_line] 63: **** 117038 [M::ha_hist_line] 64: * 100878 [M::ha_hist_line] 65: 88271 [M::ha_hist_line] 66: 77255 [M::ha_hist_line] 67: 68850 [M::ha_hist_line] 68: 61727 [M::ha_hist_line] 69: 55275 [M::ha_hist_line] 70: 50575 [M::ha_hist_line] 71: 46983 [M::ha_hist_line] 72: 44048 [M::ha_hist_line] 73: 41124 [M::ha_hist_line] 74: 39264 [M::ha_hist_line] 75: 37399 [M::ha_hist_line] 76: 35308 [M::ha_hist_line] 77: 33758 [M::ha_hist_line] 78: 32403 [M::ha_hist_line] 79: 31701 [M::ha_hist_line] 80: 30899 [M::ha_hist_line] 81: 29784 [M::ha_hist_line] 82: 28523 [M::ha_hist_line] 83: 28296 [M::ha_hist_line] 84: 27975 [M::ha_hist_line] 85: 26525 [M::ha_hist_line] 86: 26024 [M::ha_hist_line] 87: 25217 [M::ha_hist_line] 88: 24456 [M::ha_hist_line] 89: 24455 [M::ha_hist_line] 90: 23680 [M::ha_hist_line] 91: 23145 [M::ha_hist_line] 92: 22884 [M::ha_hist_line] 93: 22091 [M::ha_hist_line] 94: 21541 [M::ha_hist_line] 95: 21170 [M::ha_hist_line] 96: 20676 [M::ha_hist_line] 97: 20292 [M::ha_hist_line] 98: 20073 [M::ha_hist_line] 99: 19576 [M::ha_hist_line] 100: 19271 [M::ha_hist_line] 101: 18160 [M::ha_hist_line] 102: 17947 [M::ha_hist_line] 103: 17713 [M::ha_hist_line] 104: 17067 [M::ha_hist_line] 105: 16671 [M::ha_hist_line] 106: 16136 [M::ha_hist_line] 107: 15986 [M::ha_hist_line] 108: 15394 [M::ha_hist_line] 109: 15034 [M::ha_hist_line] 110: 14950 [M::ha_hist_line] 111: 14380 [M::ha_hist_line] 112: 14191 [M::ha_hist_line] 113: 13885 [M::ha_hist_line] 114: 13511 [M::ha_hist_line] 115: 13275 [M::ha_hist_line] 116: 13109 [M::ha_hist_line] 117: 12726 [M::ha_hist_line] 118: 12248 [M::ha_hist_line] 119: 12145 [M::ha_hist_line] 120: 11895 [M::ha_hist_line] 121: 11488 [M::ha_hist_line] 122: 11199 [M::ha_hist_line] 123: 10839 [M::ha_hist_line] 124: 10862 [M::ha_hist_line] 125: * 10747 [M::ha_hist_line] rest: ** 462664 [M::ha_analyze_count] left: none [M::ha_analyze_count] right: count[41] = 973417 [M::ha_pt_gen] peak_hom: 41; peak_het: 20 [M::ha_ct_shrink::8100.04624.15] ==> counted 57352567 distinct minimizer k-mers [M::ha_pt_gen::] counting in normal mode [M::yak_count] collected 1822126911 minimizers [M::yak_count] collected 0 minimizers [M::yak_count] collected 0 minimizers [M::ha_pt_gen::8386.04023.82] ==> indexed 1816981144 positions, counted 57352567 distinct minimizer k-mers [M::ha_assemble::13409.10426.46@71.718GB] ==> corrected reads for round 2 [M::ha_assemble] # bases: 66478568585; # corrected bases: 6008763; # recorrected bases: 165766 [M::ha_assemble] size of buffer: 17.992GB [M::yak_count] collected 1821683968 minimizers [M::yak_count] collected 0 minimizers [M::yak_count] collected 0 minimizers [M::ha_pt_gen::13669.62626.23] ==> counted 61434337 distinct minimizer k-mers [M::ha_pt_gen] count[4095] = 0 (for sanity check) [M::ha_analyze_count] lowest: count[5] = 207543 [M::ha_analyze_count] highest: count[20] = 2071255 [M::ha_hist_line] 1: ****> 4142312 [M::ha_hist_line] 2: ** 286139 [M::ha_hist_line] 3: **** 155681 [M::ha_hist_line] 4: **** 173477 [M::ha_hist_line] 5: ** 207543 [M::ha_hist_line] 6: **** 251422 [M::ha_hist_line] 7: ***** 312622 [M::ha_hist_line] 8: * 385286 [M::ha_hist_line] 9: ***** 468976 [M::ha_hist_line] 10: **** 577150 [M::ha_hist_line] 11: ** 705987 [M::ha_hist_line] 12: ** 863339 [M::ha_hist_line] 13: ** 1042172 [M::ha_hist_line] 14: **** 1236213 [M::ha_hist_line] 15: * 1427514 [M::ha_hist_line] 16: ** 1612493 [M::ha_hist_line] 17: ** 1788522 [M::ha_hist_line] 18: ***** 1923959 [M::ha_hist_line] 19: ** 2022428 [M::ha_hist_line] 20: **** 2071255 [M::ha_hist_line] 21: **** 2068044 [M::ha_hist_line] 22: **** 2016539 [M::ha_hist_line] 23: 1923768 [M::ha_hist_line] 24: *** 1806040 [M::ha_hist_line] 25: **** 1654503 [M::ha_hist_line] 26: * 1502027 [M::ha_hist_line] 27: ***** 1352887 [M::ha_hist_line] 28: ** 1208118 [M::ha_hist_line] 29: * 1096202 [M::ha_hist_line] 30: **** 1002443 [M::ha_hist_line] 31: ** 932255 [M::ha_hist_line] 32: 893316 [M::ha_hist_line] 33: ** 866598 [M::ha_hist_line] 34: ** 861907 [M::ha_hist_line] 35: ** 864368 [M::ha_hist_line] 36: * 883280 [M::ha_hist_line] 37: **** 906230 [M::ha_hist_line] 38: *** 929322 [M::ha_hist_line] 39: ** 952494 [M::ha_hist_line] 40: 972087 [M::ha_hist_line] 41: 973463 [M::ha_hist_line] 42: 971897 [M::ha_hist_line] 43: 964980 [M::ha_hist_line] 44: ** 946342 [M::ha_hist_line] 45: ** 923968 [M::ha_hist_line] 46: 885190 [M::ha_hist_line] 47: *** 839892 [M::ha_hist_line] 48: ** 791614 [M::ha_hist_line] 49: * 732810 [M::ha_hist_line] 50: *** 676014 [M::ha_hist_line] 51: ** 616945 [M::ha_hist_line] 52: * 554180 [M::ha_hist_line] 53: **** 495751 [M::ha_hist_line] 54: ** 444722 [M::ha_hist_line] 55: 390199 [M::ha_hist_line] 56: **** 340139 [M::ha_hist_line] 57: ** 294856 [M::ha_hist_line] 58: **** 254638 [M::ha_hist_line] 59: ** 217771 [M::ha_hist_line] 60: 188067 [M::ha_hist_line] 61: **** 160792 [M::ha_hist_line] 62: * 138396 [M::ha_hist_line] 63: **** 117316 [M::ha_hist_line] 64: * 100839 [M::ha_hist_line] 65: 88819 [M::ha_hist_line] 66: 77606 [M::ha_hist_line] 67: 68801 [M::ha_hist_line] 68: 61528 [M::ha_hist_line] 69: 55332 [M::ha_hist_line] 70: 50509 [M::ha_hist_line] 71: 47145 [M::ha_hist_line] 72: 44001 [M::ha_hist_line] 73: 41299 [M::ha_hist_line] 74: 39070 [M::ha_hist_line] 75: 37424 [M::ha_hist_line] 76: 35349 [M::ha_hist_line] 77: 33661 [M::ha_hist_line] 78: 32616 [M::ha_hist_line] 79: 31718 [M::ha_hist_line] 80: 30825 [M::ha_hist_line] 81: 29556 [M::ha_hist_line] 82: 28707 [M::ha_hist_line] 83: 28461 [M::ha_hist_line] 84: 27693 [M::ha_hist_line] 85: 26534 [M::ha_hist_line] 86: 25852 [M::ha_hist_line] 87: 25340 [M::ha_hist_line] 88: 24546 [M::ha_hist_line] 89: 24406 [M::ha_hist_line] 90: 23692 [M::ha_hist_line] 91: 23420 [M::ha_hist_line] 92: 23015 [M::ha_hist_line] 93: 22141 [M::ha_hist_line] 94: 21665 [M::ha_hist_line] 95: 21135 [M::ha_hist_line] 96: 20655 [M::ha_hist_line] 97: 20255 [M::ha_hist_line] 98: 19967 [M::ha_hist_line] 99: 19706 [M::ha_hist_line] 100: 19013 [M::ha_hist_line] 101: 18267 [M::ha_hist_line] 102: 18001 [M::ha_hist_line] 103: 17694 [M::ha_hist_line] 104: 17138 [M::ha_hist_line] 105: 16643 [M::ha_hist_line] 106: 16103 [M::ha_hist_line] 107: 16103 [M::ha_hist_line] 108: 15355 [M::ha_hist_line] 109: 15038 [M::ha_hist_line] 110: 14787 [M::ha_hist_line] 111: 14473 [M::ha_hist_line] 112: 14300 [M::ha_hist_line] 113: 13866 [M::ha_hist_line] 114: 13528 [M::ha_hist_line] 115: 13206 [M::ha_hist_line] 116: 12946 [M::ha_hist_line] 117: 12920 [M::ha_hist_line] 118: 12225 [M::ha_hist_line] 119: 12123 [M::ha_hist_line] 120: 11836 [M::ha_hist_line] 121: 11560 [M::ha_hist_line] 122: 11271 [M::ha_hist_line] 123: 10767 [M::ha_hist_line] 124: 10868 [M::ha_hist_line] 125: * 10697 [M::ha_hist_line] rest: ** 463461 [M::ha_analyze_count] left: none [M::ha_analyze_count] right: count[41] = 973463 [M::ha_pt_gen] peak_hom: 41; peak_het: 20 [M::ha_ct_shrink::13669.78726.23] ==> counted 57292025 distinct minimizer k-mers [M::ha_pt_gen::] counting in normal mode [M::yak_count] collected 1821683968 minimizers [M::yak_count] collected 0 minimizers [M::yak_count] collected 0 minimizers [M::ha_pt_gen::13951.57926.00] ==> indexed 1817541656 positions, counted 57292025 distinct minimizer k-mers [M::ha_assemble::18912.51627.40@93.129GB] ==> corrected reads for round 3 [M::ha_assemble] # bases: 66477756010; # corrected bases: 1095531; # recorrected bases: 143155 [M::ha_assemble] size of buffer: 17.946GB [M::yak_count] collected 1821603210 minimizers [M::yak_count] collected 0 minimizers [M::yak_count] collected 0 minimizers [M::ha_pt_gen::19198.23027.19] ==> counted 61292692 distinct minimizer k-mers [M::ha_pt_gen] count[4095] = 0 (for sanity check) [M::ha_analyze_count] lowest: count[5] = 207047 [M::ha_analyze_count] highest: count[20] = 2071230 [M::ha_hist_line] 1: ****> 4013021 [M::ha_hist_line] 2: *** 279507 [M::ha_hist_line] 3: * 152942 [M::ha_hist_line] 4: **** 172028 [M::ha_hist_line] 5: ** 207047 [M::ha_hist_line] 6: **** 250923 [M::ha_hist_line] 7: ***** 312350 [M::ha_hist_line] 8: * 385079 [M::ha_hist_line] 9: ***** 468747 [M::ha_hist_line] 10: **** 576887 [M::ha_hist_line] 11: ** 705863 [M::ha_hist_line] 12: ** 863146 [M::ha_hist_line] 13: ** 1042114 [M::ha_hist_line] 14: **** 1235795 [M::ha_hist_line] 15: * 1427617 [M::ha_hist_line] 16: ** 1612132 [M::ha_hist_line] 17: ** 1788551 [M::ha_hist_line] 18: ***** 1923715 [M::ha_hist_line] 19: ** 2022315 [M::ha_hist_line] 20: **** 2071230 [M::ha_hist_line] 21: **** 2068235 [M::ha_hist_line] 22: **** 2016679 [M::ha_hist_line] 23: 1923772 [M::ha_hist_line] 24: *** 1805841 [M::ha_hist_line] 25: **** 1654745 [M::ha_hist_line] 26: * 1502040 [M::ha_hist_line] 27: ***** 1353208 [M::ha_hist_line] 28: ** 1208178 [M::ha_hist_line] 29: * 1096041 [M::ha_hist_line] 30: **** 1002651 [M::ha_hist_line] 31: ** 932098 [M::ha_hist_line] 32: 893354 [M::ha_hist_line] 33: ** 866528 [M::ha_hist_line] 34: ** 861904 [M::ha_hist_line] 35: ** 864249 [M::ha_hist_line] 36: * 883216 [M::ha_hist_line] 37: **** 906084 [M::ha_hist_line] 38: *** 929421 [M::ha_hist_line] 39: ** 952261 [M::ha_hist_line] 40: 972135 [M::ha_hist_line] 41: 973636 [M::ha_hist_line] 42: 971679 [M::ha_hist_line] 43: 965060 [M::ha_hist_line] 44: ** 946426 [M::ha_hist_line] 45: ** 923895 [M::ha_hist_line] 46: 885296 [M::ha_hist_line] 47: *** 839999 [M::ha_hist_line] 48: ** 791807 [M::ha_hist_line] 49: * 732779 [M::ha_hist_line] 50: *** 676238 [M::ha_hist_line] 51: ** 616960 [M::ha_hist_line] 52: * 554468 [M::ha_hist_line] 53: **** 495718 [M::ha_hist_line] 54: ** 444699 [M::ha_hist_line] 55: 390222 [M::ha_hist_line] 56: **** 340196 [M::ha_hist_line] 57: ** 295063 [M::ha_hist_line] 58: **** 254705 [M::ha_hist_line] 59: ** 217688 [M::ha_hist_line] 60: 188210 [M::ha_hist_line] 61: **** 160816 [M::ha_hist_line] 62: * 138400 [M::ha_hist_line] 63: **** 117292 [M::ha_hist_line] 64: * 100901 [M::ha_hist_line] 65: 88873 [M::ha_hist_line] 66: 77611 [M::ha_hist_line] 67: 68895 [M::ha_hist_line] 68: 61518 [M::ha_hist_line] 69: * 55259 [M::ha_hist_line] 70: 50517 [M::ha_hist_line] 71: 47167 [M::ha_hist_line] 72: 43936 [M::ha_hist_line] 73: 41327 [M::ha_hist_line] 74: 39081 [M::ha_hist_line] 75: 37425 [M::ha_hist_line] 76: 35342 [M::ha_hist_line] 77: 33731 [M::ha_hist_line] 78: 32612 [M::ha_hist_line] 79: 31668 [M::ha_hist_line] 80: 30885 [M::ha_hist_line] 81: 29656 [M::ha_hist_line] 82: 28583 [M::ha_hist_line] 83: 28454 [M::ha_hist_line] 84: 27614 [M::ha_hist_line] 85: 26549 [M::ha_hist_line] 86: 25893 [M::ha_hist_line] 87: 25330 [M::ha_hist_line] 88: 24628 [M::ha_hist_line] 89: 24363 [M::ha_hist_line] 90: 23666 [M::ha_hist_line] 91: 23423 [M::ha_hist_line] 92: 23012 [M::ha_hist_line] 93: 22187 [M::ha_hist_line] 94: 21594 [M::ha_hist_line] 95: 21178 [M::ha_hist_line] 96: 20607 [M::ha_hist_line] 97: 20231 [M::ha_hist_line] 98: 19955 [M::ha_hist_line] 99: 19714 [M::ha_hist_line] 100: 19091 [M::ha_hist_line] 101: 18248 [M::ha_hist_line] 102: 18010 [M::ha_hist_line] 103: 17665 [M::ha_hist_line] 104: 17179 [M::ha_hist_line] 105: 16623 [M::ha_hist_line] 106: 16097 [M::ha_hist_line] 107: 16158 [M::ha_hist_line] 108: 15369 [M::ha_hist_line] 109: 14943 [M::ha_hist_line] 110: 14809 [M::ha_hist_line] 111: 14509 [M::ha_hist_line] 112: 14235 [M::ha_hist_line] 113: 13887 [M::ha_hist_line] 114: 13480 [M::ha_hist_line] 115: 13280 [M::ha_hist_line] 116: 12943 [M::ha_hist_line] 117: 12943 [M::ha_hist_line] 118: 12214 [M::ha_hist_line] 119: 12115 [M::ha_hist_line] 120: 11786 [M::ha_hist_line] 121: 11559 [M::ha_hist_line] 122: 11305 [M::ha_hist_line] 123: 10803 [M::ha_hist_line] 124: 10863 [M::ha_hist_line] 125: 10716 [M::ha_hist_line] rest: **** 463576 [M::ha_analyze_count] left: none [M::ha_analyze_count] right: count[41] = 973636 [M::ha_pt_gen] peak_hom: 41; peak_het: 20 [M::ha_ct_shrink::19198.40827.19] ==> counted 57279671 distinct minimizer k-mers [M::ha_pt_gen::] counting in normal mode [M::yak_count] collected 1821603210 minimizers [M::yak_count] collected 0 minimizers [M::yak_count] collected 0 minimizers [M::ha_pt_gen::19485.05127.00] ==> indexed 1817590189 positions, counted 57279671 distinct minimizer k-mers [M::ha_assemble::20903.107*27.22@97.348GB] ==> found overlaps for the final round [M::ha_print_ovlp_stat] # overlaps: 213025690 [M::ha_print_ovlp_stat] # strong overlaps: 138359083 [M::ha_print_ovlp_stat] # weak overlaps: 74666607 [M::ha_print_ovlp_stat] # exact overlaps: 202016457 [M::ha_print_ovlp_stat] # inexact overlaps: 11009233 [M::ha_print_ovlp_stat] # overlaps without large indels: 212188203 [M::ha_print_ovlp_stat] # reverse overlaps: 87475405 Writing reads to disk... Reads has been written. Writing ma_hit_ts to disk... ma_hit_ts has been written. Writing ma_hit_ts to disk... ma_hit_ts has been written. bin files have been written. [M::purge_dups] homozygous read coverage threshold: 40 [M::purge_dups] purge duplication coverage threshold: 51 Writing raw unitig GFA to disk... Writing processed unitig GFA to disk... [M::purge_dups] homozygous read coverage threshold: 40 [M::purge_dups] purge duplication coverage threshold: 51 [M::mc_solve_core::0.373] ==> Partition [M::adjust_utg_by_primary] primary contig coverage range: [34, infinity] Writing Thamnosophis_epistibes_RAN34459.bp.p_ctg.gfa to disk... [M::adjust_utg_by_trio] primary contig coverage range: [34, infinity] Writing Thamnosophis_epistibes_RAN34459.bp.hap1.p_ctg.gfa to disk... [M::adjust_utg_by_trio] primary contig coverage range: [34, infinity] Writing Thamnosophis_epistibes_RAN34459.bp.hap2.p_ctg.gfa to disk... Inconsistency threshold for low-quality regions in BED files: 70% [M::main] Version: 0.16.1-r375 [M::main] CMD: ./hifiasm -o Thamnosophis_epistibes_RAN34459 -t 32 /home/ddebaun/mendel-nas1/pacbio/m64190e_220920_181426.hifi_reads.fastq.gz /home/ddebaun/mendel-nas1/pacbio/m64190e_220924_052532.hifi_reads.fastq.gz /home/ddebaun/mendel-nas1/pacbio/m64190e_220925_162333.hifi_reads.fastq.gz [M::main] Real time: 21618.725 sec; CPU: 570416.272 sec; Peak RSS: 97.348 GB

Any idea what could be causing this issue? Perhaps I need to add some of the flags? Or I need to run purge_dups? Or trim the reads with cut_adapt prior to running? Any input would be greatly appreciated!

chhylp123 commented 1 year ago

I think the first step is to make sure hifiasm assemblies are right. If you have the Hi-C reads, it would be better to do scaffolding in case there are potential misassemblies. Sometimes the assemblies are very different from the reference genome.

dylandebaun commented 1 year ago

Unfortunately I do not have Hi-C reads for this. A run on QUAST against the reference only tells me there is one misassembly 22KB in length. My colleague is having the same issue (highly different assemblies between HiFi and reference) even though his reference genome is in the same genus.

chhylp123 commented 1 year ago

I think we should first make sure there is no chromosome-level misassembly. This could be relatively easy to be identified by looking at the alignment and the contig length. If there is no chromosome-level misassembly, I guess the results should be fine.