chhylp123 / hifiasm

Hifiasm: a haplotype-resolved assembler for accurate Hifi reads
MIT License
530 stars 87 forks source link

Skewed histogram in multiple species #511

Closed juliayork closed 1 year ago

juliayork commented 1 year ago

Hello. We are sequencing some fish genomes and getting weird k-mer histograms. They are skewed, and basically just decrease, see below for an example. We've gotten this now for multiple species. Some of them assemble ok, some crash hifiasm. Any thoughts on what might be happening? Thanks.

[M::ha_hist_line] 1: ****> 9510455 [M::ha_hist_line] 2: ****> 3123317 [M::ha_hist_line] 3: ****> 1382673 [M::ha_hist_line] 4: ****> 984780 [M::ha_hist_line] 5: ****> 839012 [M::ha_hist_line] 6: ****> 770095 [M::ha_hist_line] 7: ****> 714273 [M::ha_hist_line] 8: ****> 671271 [M::ha_hist_line] 9: ****> 636354 [M::ha_hist_line] 10: ****> 615123 [M::ha_hist_line] 11: ****> 574063 [M::ha_hist_line] 12: ****> 548675 [M::ha_hist_line] 13: ****> 514586 [M::ha_hist_line] 14: ****> 483769 [M::ha_hist_line] 15: ****> 446808 [M::ha_hist_line] 16: ****> 419135 [M::ha_hist_line] 17: ****> 392639 [M::ha_hist_line] 18: ****> 368844 [M::ha_hist_line] 19: ****> 349377 [M::ha_hist_line] 20: ****> 334796 [M::ha_hist_line] 21: ****> 319365 [M::ha_hist_line] 22: ****> 308777 [M::ha_hist_line] 23: ****> 299369 [M::ha_hist_line] 24: ****> 289658 [M::ha_hist_line] 25: ****> 275924 [M::ha_hist_line] 26: ****> 263907 [M::ha_hist_line] 27: ****> 239723 [M::ha_hist_line] 28: ****> 218143 [M::ha_hist_line] 29: ****> 194708 [M::ha_hist_line] 30: ****> 169668 [M::ha_hist_line] 31: ****> 146260 [M::ha_hist_line] 32: ****> 125626 [M::ha_hist_line] 33: ****> 107179 [M::ha_hist_line] 34: ****> 91255 [M::ha_hist_line] 35: ****> 73909 [M::ha_hist_line] 36: ****> 60900 [M::ha_hist_line] 37: ****> 48023 [M::ha_hist_line] 38: ****> 39803 [M::ha_hist_line] 39: ****> 33902 [M::ha_hist_line] 40: ****> 28526 [M::ha_hist_line] 41: ****> 23853 [M::ha_hist_line] 42: ****> 21474 [M::ha_hist_line] 43: ****> 19075 [M::ha_hist_line] 44: ****> 17773 [M::ha_hist_line] 45: ****> 16092 [M::ha_hist_line] 46: ****> 15033 [M::ha_hist_line] 47: ****> 14448 [M::ha_hist_line] 48: ****> 13749 [M::ha_hist_line] 49: ****> 13013 [M::ha_hist_line] 50: ****> 12620 [M::ha_hist_line] 51: ****> 11759 [M::ha_hist_line] 52: ****> 11513 [M::ha_hist_line] 53: ****> 10833 [M::ha_hist_line] 54: ****> 10759 [M::ha_hist_line] 55: ****> 10695 [M::ha_hist_line] 56: ****> 10418 [M::ha_hist_line] 57: ****> 9682 [M::ha_hist_line] 58: ****> 9459 [M::ha_hist_line] 59: **** 8787 [M::ha_hist_line] 60: **** 8800 [M::ha_hist_line] 61: **** 8110 [M::ha_hist_line] 62: **** 7860 [M::ha_hist_line] 63: 7486 [M::ha_hist_line] 64: *** 7293 [M::ha_hist_line] 65: ** 6843 [M::ha_hist_line] 66: **** 6674 [M::ha_hist_line] 67: **** 6328 [M::ha_hist_line] 68: * 6428 [M::ha_hist_line] 69: **** 5981 [M::ha_hist_line] 70: **** 6009 [M::ha_hist_line] 71: ** 5718 [M::ha_hist_line] 72: 5535 [M::ha_hist_line] 73: * 5358 [M::ha_hist_line] 74: **** 5295 [M::ha_hist_line] 75: ** 5096 [M::ha_hist_line] 76: * 5049 [M::ha_hist_line] 77: * 4842 [M::ha_hist_line] 78: **** 4732 [M::ha_hist_line] 79: **** 4615 [M::ha_hist_line] 80: **** 4538 [M::ha_hist_line] 81: * 4627 [M::ha_hist_line] 82: * 4350 [M::ha_hist_line] 83: **** 4209 [M::ha_hist_line] 84: * 4315 [M::ha_hist_line] 85: **** 4181 [M::ha_hist_line] 86: * 3977 [M::ha_hist_line] 87: **** 3850 [M::ha_hist_line] 88: * 3935 [M::ha_hist_line] 89: ** 3663 [M::ha_hist_line] 90: * 3648 [M::ha_hist_line] 91: ** 3545 [M::ha_hist_line] 92: * 3456 [M::ha_hist_line] 93: * 3402 [M::ha_hist_line] 94: ** 3365 [M::ha_hist_line] 95: ** 3304 [M::ha_hist_line] 96: ** 3361 [M::ha_hist_line] 97: ** 3014 [M::ha_hist_line] 98: * 3073 [M::ha_hist_line] 99: ** 2992 [M::ha_hist_line] 100: * 2913 [M::ha_hist_line] 101: **** 2990 [M::ha_hist_line] 102: * 2765 [M::ha_hist_line] 103: * 2760 [M::ha_hist_line] 104: **** 2773 [M::ha_hist_line] 105: * 2687 [M::ha_hist_line] 106: ** 2639 [M::ha_hist_line] 107: * 2548 [M::ha_hist_line] 108: **** 2503 [M::ha_hist_line] 109: * 2562 [M::ha_hist_line] 110: ** 2515 [M::ha_hist_line] 111: 2403 [M::ha_hist_line] 112: **** 2438 [M::ha_hist_line] 113: * 2361 [M::ha_hist_line] 114: **** 2281 [M::ha_hist_line] 115: ** 2324 [M::ha_hist_line] 116: ** 2291 [M::ha_hist_line] 117: **** 2122 [M::ha_hist_line] 118: * 2200 [M::ha_hist_line] 119: **** 2104 [M::ha_hist_line] 120: * 2161 [M::ha_hist_line] 121: **** 2117 [M::ha_hist_line] 122: * 2180 [M::ha_hist_line] 123: **** 2097 [M::ha_hist_line] 124: * 2066 [M::ha_hist_line] 125: **** 1962 [M::ha_hist_line] 126: ** 1946 [M::ha_hist_line] 127: ** 1933 [M::ha_hist_line] 128: ** 1930 [M::ha_hist_line] 129: ** 1944 [M::ha_hist_line] 130: ** 1908 [M::ha_hist_line] 131: ** 1904 [M::ha_hist_line] 132: * 1860 [M::ha_hist_line] 133: **** 1771 [M::ha_hist_line] 134: ** 1828 [M::ha_hist_line] 135: 1698 [M::ha_hist_line] 136: **** 1736 [M::ha_hist_line] 137: **** 1766 [M::ha_hist_line] 138: * 1695 [M::ha_hist_line] 139: ** 1560 [M::ha_hist_line] 140: ** 1621 [M::ha_hist_line] 141: ** 1618 [M::ha_hist_line] 142: * 1649 [M::ha_hist_line] 143: ** 1551 [M::ha_hist_line] 144: * 1535 [M::ha_hist_line] 145: ** 1576 [M::ha_hist_line] 146: **** 1394 [M::ha_hist_line] 147: * 1494 [M::ha_hist_line] 148: **** 1446 [M::ha_hist_line] 149: * 1524 [M::ha_hist_line] 150: ** 1563 [M::ha_hist_line] 151: * 1476 [M::ha_hist_line] 152: * 1472 [M::ha_hist_line] 153: ** 1422 [M::ha_hist_line] 154: **** 1396 [M::ha_hist_line] 155: * 1353 [M::ha_hist_line] 156: * 1329 [M::ha_hist_line] 157: **** 1402 [M::ha_hist_line] 158: **** 1386 [M::ha_hist_line] 159: ** 1255 [M::ha_hist_line] 160: ** 1266 [M::ha_hist_line] 161: ** 1239 [M::ha_hist_line] 162: * 1334 [M::ha_hist_line] 163: ** 1248 [M::ha_hist_line] 164: * 1326 [M::ha_hist_line] 165: **** 1237 [M::ha_hist_line] 166: ** 1229 [M::ha_hist_line] 167: * 1145 [M::ha_hist_line] 168: ** 1215 [M::ha_hist_line] 169: **** 1078 [M::ha_hist_line] 170: 1175 [M::ha_hist_line] 171: 1108 [M::ha_hist_line] 172: * 1157 [M::ha_hist_line] 173: **** 1081 [M::ha_hist_line] 174: * 1107 [M::ha_hist_line] 175: **** 1099 [M::ha_hist_line] 176: **** 1052 [M::ha_hist_line] 177: **** 1027 [M::ha_hist_line] 178: **** 1055 [M::ha_hist_line] 179: **** 1038 [M::ha_hist_line] 180: **** 1079 [M::ha_hist_line] 181: **** 1100 [M::ha_hist_line] 182: **** 1071 [M::ha_hist_line] 183: ** 919 [M::ha_hist_line] 184: **** 1018 [M::ha_hist_line] 185: 947 [M::ha_hist_line] 186: 1012 [M::ha_hist_line] 187: 959 [M::ha_hist_line] 188: 948 [M::ha_hist_line] 189: * 968 [M::ha_hist_line] 190: ** 882 [M::ha_hist_line] 191: ** 909 [M::ha_hist_line] 192: * 966 [M::ha_hist_line] 193: ** 923 [M::ha_hist_line] 194: ** 869 [M::ha_hist_line] 195: * 833 [M::ha_hist_line] 196: ** 894 [M::ha_hist_line] 197: * 827 [M::ha_hist_line] 198: ** 852 [M::ha_hist_line] 199: * 817 [M::ha_hist_line] 200: ** 875 [M::ha_hist_line] 201: * 829 [M::ha_hist_line] 202: ** 868 [M::ha_hist_line] 203: ** 841 [M::ha_hist_line] 204: 826 [M::ha_hist_line] 205: 801 [M::ha_hist_line] 206: 770 [M::ha_hist_line] 207: 786 [M::ha_hist_line] 208: 828 [M::ha_hist_line] 209: 749 [M::ha_hist_line] 210: 778 [M::ha_hist_line] 211: 828 [M::ha_hist_line] 212: 774 [M::ha_hist_line] 213: 788 [M::ha_hist_line] 214: * 764 [M::ha_hist_line] 215: **** 742 [M::ha_hist_line] 216: **** 737 [M::ha_hist_line] 217: * 781 [M::ha_hist_line] 218: 758 [M::ha_hist_line] 219: 749 [M::ha_hist_line] 220: **** 673 [M::ha_hist_line] 221: **** 697 [M::ha_hist_line] 222: * 641 [M::ha_hist_line] 223: **** 705 [M::ha_hist_line] 224: **** 707 [M::ha_hist_line] 225: * 643 [M::ha_hist_line] 226: **** 661 [M::ha_hist_line] 227: **** 661 [M::ha_hist_line] 228: 636 [M::ha_hist_line] 229: 611 [M::ha_hist_line] 230: 638 [M::ha_hist_line] 231: 616 [M::ha_hist_line] 232: 621 [M::ha_hist_line] 233: 603 [M::ha_hist_line] 234: 617 [M::ha_hist_line] 235: 603 [M::ha_hist_line] 236: 602 [M::ha_hist_line] 237: 588 [M::ha_hist_line] 238: 575 [M::ha_hist_line] 239: 596 [M::ha_hist_line] 240: * 586 [M::ha_hist_line] 241: ** 556 [M::ha_hist_line] 242: * 602 [M::ha_hist_line] 243: ** 550 [M::ha_hist_line] 244: ** 509 [M::ha_hist_line] 245: ** 533 [M::ha_hist_line] 246: ** 569 [M::ha_hist_line] 247: ** 551 [M::ha_hist_line] 248: ** 558 [M::ha_hist_line] 249: ** 557 [M::ha_hist_line] 250: ** 531 [M::ha_hist_line] 251: ** 514 [M::ha_hist_line] 252: ** 513 [M::ha_hist_line] 253: * 576 [M::ha_hist_line] 254: **** 502 [M::ha_hist_line] 255: ** 523 [M::ha_hist_line] 256: ** 486 [M::ha_hist_line] 257: ** 493 [M::ha_hist_line] 258: ** 505 [M::ha_hist_line] 259: ** 532 [M::ha_hist_line] 260: ** 505 [M::ha_hist_line] 261: * 479 [M::ha_hist_line] 262: ** 489 [M::ha_hist_line] 263: ** 493 [M::ha_hist_line] 264: * 475 [M::ha_hist_line] 265: ** 505 [M::ha_hist_line] 266: * 481 [M::ha_hist_line] 267: ** 485 [M::ha_hist_line] 268: ** 523 [M::ha_hist_line] 269: ** 500 [M::ha_hist_line] 270: ** 490 [M::ha_hist_line] 271: ** 494 [M::ha_hist_line] 272: 427 [M::ha_hist_line] 273: 421 [M::ha_hist_line] 274: 445 [M::ha_hist_line] 275: 466 [M::ha_hist_line] 276: 456 [M::ha_hist_line] 277: 473 [M::ha_hist_line] 278: 470 [M::ha_hist_line] 279: 478 [M::ha_hist_line] 280: 467 [M::ha_hist_line] 281: 431 [M::ha_hist_line] 282: 448 [M::ha_hist_line] 283: 434 [M::ha_hist_line] 284: 430 [M::ha_hist_line] 285: 446 [M::ha_hist_line] 286: 445 [M::ha_hist_line] 287: 431 [M::ha_hist_line] 288: 459 [M::ha_hist_line] 289: 432 [M::ha_hist_line] 290: 398 [M::ha_hist_line] 291: 404 [M::ha_hist_line] 292: 389 [M::ha_hist_line] 293: 376 [M::ha_hist_line] 294: 385 [M::ha_hist_line] 295: 323 [M::ha_hist_line] 296: 366 [M::ha_hist_line] 297: 346 [M::ha_hist_line] 298: 366 [M::ha_hist_line] 299: 358 [M::ha_hist_line] 300: 373 [M::ha_hist_line] 301: 367 [M::ha_hist_line] 302: 375 [M::ha_hist_line] 303: 362 [M::ha_hist_line] 304: 372 [M::ha_hist_line] 305: 325 [M::ha_hist_line] 306: 369 [M::ha_hist_line] 307: 377 [M::ha_hist_line] 308: * 407 [M::ha_hist_line] 309: ** 340 [M::ha_hist_line] 310: 352 [M::ha_hist_line] 311: 361 [M::ha_hist_line] 312: 342 [M::ha_hist_line] 313: 355 [M::ha_hist_line] 314: 328 [M::ha_hist_line] 315: 313 [M::ha_hist_line] 316: *** 338 [M::ha_hist_line] 317: 308 [M::ha_hist_line] 318: 347 [M::ha_hist_line] 319: 313 [M::ha_hist_line] 320: 320 [M::ha_hist_line] 321: 322 [M::ha_hist_line] 322: 307 [M::ha_hist_line] 323: 297 [M::ha_hist_line] 324: ** 313 [M::ha_hist_line] 325: * 305 [M::ha_hist_line] 326: ** 310 [M::ha_hist_line] 327: * 304 [M::ha_hist_line] 328: ** 319 [M::ha_hist_line] 329: ** 276 [M::ha_hist_line] 330: 289 [M::ha_hist_line] 331: 292 [M::ha_hist_line] 332: 256 [M::ha_hist_line] 333: 322 [M::ha_hist_line] 334: 314 [M::ha_hist_line] 335: * 299 [M::ha_hist_line] 336: *** 309 [M::ha_hist_line] 337: 308 [M::ha_hist_line] 338: * 280 [M::ha_hist_line] 339: *** 328 [M::ha_hist_line] 340: 301 [M::ha_hist_line] 341: 300 [M::ha_hist_line] 342: 300 [M::ha_hist_line] 343: 265 [M::ha_hist_line] 344: 258 [M::ha_hist_line] 345: 300 [M::ha_hist_line] 346: 300 [M::ha_hist_line] 347: 296 [M::ha_hist_line] 348: 282 [M::ha_hist_line] 349: 268 [M::ha_hist_line] 350: 279 [M::ha_hist_line] 351: 277 [M::ha_hist_line] 352: 264 [M::ha_hist_line] 353: 282 [M::ha_hist_line] 354: 272 [M::ha_hist_line] 355: 297 [M::ha_hist_line] 356: 294 [M::ha_hist_line] 357: 280 [M::ha_hist_line] 358: 291 [M::ha_hist_line] 359: 283 [M::ha_hist_line] 360: 269 [M::ha_hist_line] 361: 292 [M::ha_hist_line] 362: 283 [M::ha_hist_line] 363: 257 [M::ha_hist_line] 364: 248 [M::ha_hist_line] 365: 245 [M::ha_hist_line] 366: 232 [M::ha_hist_line] 367: 220 [M::ha_hist_line] 368: 243 [M::ha_hist_line] 369: 220 [M::ha_hist_line] 370: 232 [M::ha_hist_line] 371: 206 [M::ha_hist_line] 372: * 226 [M::ha_hist_line] 373: 199 [M::ha_hist_line] 374: * 215 [M::ha_hist_line] 375: 240 [M::ha_hist_line] 376: 241 [M::ha_hist_line] 377: 216 [M::ha_hist_line] 378: 209 [M::ha_hist_line] 379: 226 [M::ha_hist_line] 380: 161 [M::ha_hist_line] 381: 204 [M::ha_hist_line] 382: 200 [M::ha_hist_line] 383: ** 239 [M::ha_hist_line] 384: 239 [M::ha_hist_line] 385: 223 [M::ha_hist_line] 386: 217 [M::ha_hist_line] 387: 204 [M::ha_hist_line] 388: 231 [M::ha_hist_line] 389: 196 [M::ha_hist_line] 390: 211 [M::ha_hist_line] 391: 188 [M::ha_hist_line] 392: 197 [M::ha_hist_line] 393: 182 [M::ha_hist_line] 394: 168 [M::ha_hist_line] 395: 143 [M::ha_hist_line] 396: 115 [M::ha_hist_line] 397: 112 [M::ha_hist_line] 398: 106 [M::ha_hist_line] 399: 70 [M::ha_hist_line] 400: * 50 [M::ha_hist_line] rest: ***** 3115

chhylp123 commented 1 year ago

The FAQ here might be helpful: https://hifiasm.readthedocs.io/en/latest/faq.html#why-does-hifiasm-stuck-or-crash

olekto commented 1 year ago

What kind of fish is this? We see similar issues with codfish, and the problem is that these fishes have a lot of short tandem repeats (microsatellites) which do not agree with PacBio sequencing much. What you get is regions of good coverage, and regions of really poor or no coverage, basically incomplete coverage of the genome. It will be hard to assembled such a dataset. It might help with increasing coverage (but we haven't explored this fully) and Revio might also help (different consensus algorithm, so might recover more bases than Sequel).

juliayork commented 1 year ago

It is indeed a codfish! Okay, thanks for the tips. I'm going to send you an email shortly. If you don't mind, maybe we can stay in contact about this issue.

olekto commented 1 year ago

Sure, @juliayork, send me an e-mail at o.k.torresen@ibv.uio.no, and we can chat further.