chhylp123 / hifiasm

Hifiasm: a haplotype-resolved assembler for accurate Hifi reads
MIT License
529 stars 86 forks source link

Does het_peak=-1 have any effect on assembly? #563

Closed xuzhougeng closed 10 months ago

xuzhougeng commented 10 months ago

I got a lot of HiFi data due to the small genome of the species. I noticed that when assembling, het_peak has value at the beginning and later it is equal to -1. Does this have any effect on the assembly please

This is log information in my assemble.

[M::ha_analyze_count] lowest: count[21] = 14238
[M::ha_analyze_count] highest: count[151] = 2093738
...
[M::ha_analyze_count] left: count[75] = 1864731
[M::ha_analyze_count] right: none
[M::ha_ft_gen] peak_hom: 151; peak_het: 75
[M::ha_ct_shrink::909.192*8.89] ==> counted 1103048 distinct minimizer k-mers
[M::ha_ft_gen::910.017*8.89@18.916GB] ==> filtered out 1103048 k-mers occurring 755 or more times
[M::ha_opt_update_cov] updated max_n_chain to 755
[M::yak_count] collected 751585251 minimizers
[M::ha_pt_gen::1040.458*9.92] ==> counted 18468452 distinct minimizer k-mers
[M::ha_pt_gen] count[4095] = 0 (for sanity check)
---
[M::ha_analyze_count] lowest: count[27] = 1433
[M::ha_analyze_count] highest: count[150] = 83180
....
[M::ha_analyze_count] left: none
[M::ha_analyze_count] right: none
[M::ha_pt_gen] peak_hom: 150; peak_het: -1
[M::ha_ct_shrink::1040.510*9.92] ==> counted 6660252 distinct minimizer k-mers
[M::ha_pt_gen::] counting in normal mode
[M::yak_count] collected 751585251 minimizers
[M::ha_pt_gen::1158.933*11.04] ==> indexed 739777051 positions, counted 6660252 distinct minimizer k-mers
---
[M::ha_assemble::5466.528*77.39@44.607GB] ==> corrected reads for round 1
[M::ha_assemble] # bases: 31117113933; # corrected bases: 62165656; # recorrected bases: 77882
[M::ha_assemble] size of buffer: 9.765GB
[M::yak_count] collected 746784247 minimizers
[M::ha_pt_gen::5553.133*76.57] ==> counted 6652545 distinct minimizer k-mers
[M::ha_pt_gen] count[4095] = 0 (for sanity check)
[M::ha_analyze_count] lowest: count[22] = 435
[M::ha_analyze_count] highest: count[153] = 83012
...
[M::ha_analyze_count] left: none
[M::ha_analyze_count] right: none
[M::ha_pt_gen] peak_hom: 153; peak_het: -1
[M::ha_ct_shrink::5553.174*76.57] ==> counted 5570263 distinct minimizer k-mers
[M::ha_pt_gen::] counting in normal mode
[M::yak_count] collected 746784247 minimizers
[M::ha_pt_gen::5661.944*75.52] ==> indexed 745701965 positions, counted 5570263 distinct minimizer k-mers
---
[M::ha_assemble::9680.063*83.67@45.453GB] ==> corrected reads for round 2
[M::ha_assemble] # bases: 31106975531; # corrected bases: 2291962; # recorrected bases: 3387
[M::ha_assemble] size of buffer: 7.820GB
[M::yak_count] collected 746365875 minimizers
[M::ha_pt_gen::9770.062*83.12] ==> counted 6077208 distinct minimizer k-mers
[M::ha_pt_gen] count[4095] = 0 (for sanity check)
[M::ha_analyze_count] lowest: count[13] = 641
[M::ha_analyze_count] highest: count[153] = 82819
...
[M::ha_analyze_count] left: none
[M::ha_analyze_count] right: none
[M::ha_pt_gen] peak_hom: 153; peak_het: -1
[M::ha_ct_shrink::9770.104*83.12] ==> counted 5513316 distinct minimizer k-mers
[M::ha_pt_gen::] counting in normal mode
[M::yak_count] collected 746365875 minimizers
[M::ha_pt_gen::9878.399*82.45] ==> indexed 745801983 positions, counted 5513316 distinct minimizer k-mers
---
[M::ha_assemble::13504.940*85.91@68.216GB] ==> corrected reads for round 3
[M::ha_assemble] # bases: 31106284814; # corrected bases: 188803; # recorrected bases: 3056
[M::ha_assemble] size of buffer: 7.614GB
[M::yak_count] collected 746282438 minimizers
[M::ha_pt_gen::13590.102*85.53] ==> counted 6029200 distinct minimizer k-mers
[M::ha_pt_gen] count[4095] = 0 (for sanity check)
[M::ha_analyze_count] lowest: count[9] = 1078
[M::ha_analyze_count] highest: count[153] = 82817
...
[M::ha_analyze_count] left: none
[M::ha_analyze_count] right: none
[M::ha_pt_gen] peak_hom: 153; peak_het: -1
[M::ha_ct_shrink::13590.148*85.53] ==> counted 5498658 distinct minimizer k-mers
[M::ha_pt_gen::] counting in normal mode
[M::yak_count] collected 746282438 minimizers
[M::ha_pt_gen::13694.920*85.05] ==> indexed 745751896 positions, counted 5498658 distinct minimizer k-mers

The part of ...is the kmer distribution which have a very clear peak of het and homo.

[M::ha_hist_line]     1: ****************************************************************************************************> 530542
[M::ha_hist_line]     2: **************************************** 33369
[M::ha_hist_line]     3: ************** 11956
[M::ha_hist_line]     4: ******** 6970
[M::ha_hist_line]     5: ***** 4123
[M::ha_hist_line]     6: *** 2622
[M::ha_hist_line]     7: ** 1930
[M::ha_hist_line]     8: ** 1462
[M::ha_hist_line]     9: * 1078
[M::ha_hist_line]    10: * 1092
[M::ha_hist_line]    11: * 751
[M::ha_hist_line]    12: * 694
[M::ha_hist_line]    13: * 571
[M::ha_hist_line]    14: * 573
[M::ha_hist_line]    15: * 508
[M::ha_hist_line]    16: * 415
[M::ha_hist_line]    17: * 434
[M::ha_hist_line]    18:  395
[M::ha_hist_line]    19:  393
[M::ha_hist_line]    20:  394
[M::ha_hist_line]    21:  349
[M::ha_hist_line]    22:  347
[M::ha_hist_line]    23:  392
[M::ha_hist_line]    24:  410
[M::ha_hist_line]    25: * 466
[M::ha_hist_line]    26: * 494
[M::ha_hist_line]    27: * 517
[M::ha_hist_line]    28: * 622
[M::ha_hist_line]    29: * 614
[M::ha_hist_line]    30: * 815
[M::ha_hist_line]    31: * 714
[M::ha_hist_line]    32: * 725
[M::ha_hist_line]    33: * 690
[M::ha_hist_line]    34: * 831
[M::ha_hist_line]    35: * 857
[M::ha_hist_line]    36: * 900
[M::ha_hist_line]    37: * 937
[M::ha_hist_line]    38: * 1124
[M::ha_hist_line]    39: ** 1255
[M::ha_hist_line]    40: ** 1390
[M::ha_hist_line]    41: ** 1566
[M::ha_hist_line]    42: ** 1469
[M::ha_hist_line]    43: ** 1478
[M::ha_hist_line]    44: ** 1575
[M::ha_hist_line]    45: ** 1840
[M::ha_hist_line]    46: ** 2000
[M::ha_hist_line]    47: *** 2524
[M::ha_hist_line]    48: **** 3002
[M::ha_hist_line]    49: ***** 3882
[M::ha_hist_line]    50: ***** 4382
[M::ha_hist_line]    51: ******* 5836
[M::ha_hist_line]    52: ******** 6484
[M::ha_hist_line]    53: ********* 7834
[M::ha_hist_line]    54: *********** 8765
[M::ha_hist_line]    55: ************ 10348
[M::ha_hist_line]    56: *************** 12293
[M::ha_hist_line]    57: ****************** 14731
[M::ha_hist_line]    58: ********************* 17034
[M::ha_hist_line]    59: ************************* 20497
[M::ha_hist_line]    60: ***************************** 24002
[M::ha_hist_line]    61: *********************************** 28642
[M::ha_hist_line]    62: **************************************** 33535
[M::ha_hist_line]    63: *********************************************** 38650
[M::ha_hist_line]    64: *************************************************** 42559
[M::ha_hist_line]    65: ********************************************************* 47280
[M::ha_hist_line]    66: **************************************************************** 52631
[M::ha_hist_line]    67: ********************************************************************* 57210
[M::ha_hist_line]    68: **************************************************************************** 63154
[M::ha_hist_line]    69: ********************************************************************************** 67840
[M::ha_hist_line]    70: *************************************************************************************** 71642
[M::ha_hist_line]    71: ****************************************************************************************** 74389
[M::ha_hist_line]    72: ********************************************************************************************* 76769
[M::ha_hist_line]    73: *********************************************************************************************** 78906
[M::ha_hist_line]    74: *************************************************************************************************** 81688
[M::ha_hist_line]    75: *************************************************************************************************** 82062
[M::ha_hist_line]    76: ************************************************************************************************** 81499
[M::ha_hist_line]    77: ************************************************************************************************* 80525
[M::ha_hist_line]    78: ********************************************************************************************* 77183
[M::ha_hist_line]    79: ****************************************************************************************** 74231
[M::ha_hist_line]    80: ************************************************************************************* 70788
[M::ha_hist_line]    81: ********************************************************************************* 67207
[M::ha_hist_line]    82: *************************************************************************** 62337
[M::ha_hist_line]    83: ********************************************************************** 57657
[M::ha_hist_line]    84: **************************************************************** 52802
[M::ha_hist_line]    85: ********************************************************** 48143
[M::ha_hist_line]    86: **************************************************** 43348
[M::ha_hist_line]    87: ********************************************** 37833
[M::ha_hist_line]    88: **************************************** 32803
[M::ha_hist_line]    89: ********************************** 28328
[M::ha_hist_line]    90: ***************************** 23762
[M::ha_hist_line]    91: ************************ 20096
[M::ha_hist_line]    92: ********************* 17031
[M::ha_hist_line]    93: ****************** 14583
[M::ha_hist_line]    94: *************** 12505
[M::ha_hist_line]    95: ************* 10428
[M::ha_hist_line]    96: *********** 8937
[M::ha_hist_line]    97: ********* 7579
[M::ha_hist_line]    98: ******** 6318
[M::ha_hist_line]    99: ******* 5419
[M::ha_hist_line]   100: ****** 4883
[M::ha_hist_line]   101: ***** 4016
[M::ha_hist_line]   102: **** 3487
[M::ha_hist_line]   103: **** 3291
[M::ha_hist_line]   104: **** 3617
[M::ha_hist_line]   105: **** 3233
[M::ha_hist_line]   106: **** 3506
[M::ha_hist_line]   107: **** 3578
[M::ha_hist_line]   108: **** 3397
[M::ha_hist_line]   109: **** 3604
[M::ha_hist_line]   110: ***** 3737
[M::ha_hist_line]   111: ***** 4113
[M::ha_hist_line]   112: ***** 4422
[M::ha_hist_line]   113: ****** 4740
[M::ha_hist_line]   114: ****** 4973
[M::ha_hist_line]   115: ******* 5696
[M::ha_hist_line]   116: ******* 6174
[M::ha_hist_line]   117: ******** 6960
[M::ha_hist_line]   118: ********** 8138
[M::ha_hist_line]   119: *********** 9097
[M::ha_hist_line]   120: ************ 10027
[M::ha_hist_line]   121: ************** 11493
[M::ha_hist_line]   122: **************** 12967
[M::ha_hist_line]   123: ***************** 14400
[M::ha_hist_line]   124: ******************* 16147
[M::ha_hist_line]   125: ********************* 17702
[M::ha_hist_line]   126: ************************ 20197
[M::ha_hist_line]   127: *************************** 22385
[M::ha_hist_line]   128: ****************************** 25019
[M::ha_hist_line]   129: ********************************* 27265
[M::ha_hist_line]   130: ************************************ 29733
[M::ha_hist_line]   131: **************************************** 33526
[M::ha_hist_line]   132: ******************************************* 35527
[M::ha_hist_line]   133: ********************************************** 37993
[M::ha_hist_line]   134: *************************************************** 41984
[M::ha_hist_line]   135: ***************************************************** 44188
[M::ha_hist_line]   136: ********************************************************* 47002
[M::ha_hist_line]   137: ************************************************************* 50855
[M::ha_hist_line]   138: ****************************************************************** 54287
[M::ha_hist_line]   139: ********************************************************************* 57506
[M::ha_hist_line]   140: ************************************************************************** 61284
[M::ha_hist_line]   141: ******************************************************************************* 65194
[M::ha_hist_line]   142: ********************************************************************************** 67896
[M::ha_hist_line]   143: ************************************************************************************* 70240
[M::ha_hist_line]   144: ***************************************************************************************** 73381
[M::ha_hist_line]   145: ******************************************************************************************* 75642
[M::ha_hist_line]   146: ********************************************************************************************* 76638
[M::ha_hist_line]   147: ************************************************************************************************ 79327
[M::ha_hist_line]   148: ************************************************************************************************** 80799
[M::ha_hist_line]   149: ************************************************************************************************** 81230
[M::ha_hist_line]   150: **************************************************************************************************** 82422
[M::ha_hist_line]   151: **************************************************************************************************** 82585
[M::ha_hist_line]   152: *************************************************************************************************** 82209
[M::ha_hist_line]   153: **************************************************************************************************** 82817
[M::ha_hist_line]   154: *************************************************************************************************** 82217
[M::ha_hist_line]   155: ************************************************************************************************* 80444
[M::ha_hist_line]   156: ********************************************************************************************** 77873
[M::ha_hist_line]   157: ******************************************************************************************** 76027
[M::ha_hist_line]   158: **************************************************************************************** 72511
[M::ha_hist_line]   159: ************************************************************************************* 70641
[M::ha_hist_line]   160: ********************************************************************************** 67689
[M::ha_hist_line]   161: ****************************************************************************** 64309
[M::ha_hist_line]   162: ************************************************************************* 60750
[M::ha_hist_line]   163: ********************************************************************* 56800
[M::ha_hist_line]   164: ***************************************************************** 53610
[M::ha_hist_line]   165: ************************************************************ 49614
[M::ha_hist_line]   166: ******************************************************** 46165
[M::ha_hist_line]   167: *************************************************** 42395
[M::ha_hist_line]   168: *********************************************** 38690
[M::ha_hist_line]   169: ******************************************* 35829
[M::ha_hist_line]   170: **************************************** 32989
[M::ha_hist_line]   171: *********************************** 29007
[M::ha_hist_line]   172: ******************************** 26333
[M::ha_hist_line]   173: ***************************** 23838
[M::ha_hist_line]   174: ************************** 21441
[M::ha_hist_line]   175: *********************** 18660
[M::ha_hist_line]   176: ********************* 16988
[M::ha_hist_line]   177: ****************** 14978
[M::ha_hist_line]   178: **************** 13416
[M::ha_hist_line]   179: *************** 12282
[M::ha_hist_line]   180: ************ 10325
[M::ha_hist_line]   181: *********** 9314
[M::ha_hist_line]   182: ********** 8002
[M::ha_hist_line]   183: ******** 6838
[M::ha_hist_line]   184: ******* 5915
[M::ha_hist_line]   185: ****** 5211
[M::ha_hist_line]   186: ***** 4414
[M::ha_hist_line]   187: ***** 3810
[M::ha_hist_line]   188: **** 3467
[M::ha_hist_line]   189: **** 2966
[M::ha_hist_line]   190: *** 2654
[M::ha_hist_line]   191: *** 2435
[M::ha_hist_line]   192: ** 2059
[M::ha_hist_line]   193: ** 1826
[M::ha_hist_line]   194: ** 1679
[M::ha_hist_line]   195: ** 1628
[M::ha_hist_line]   196: ** 1555
[M::ha_hist_line]   197: ** 1497
[M::ha_hist_line]   198: ** 1387
[M::ha_hist_line]   199: ** 1461
[M::ha_hist_line]   200: ** 1487
[M::ha_hist_line]   201: ** 1574
[M::ha_hist_line]   202: ** 1521
[M::ha_hist_line]   203: ** 1366
[M::ha_hist_line]   204: ** 1483
[M::ha_hist_line]   205: ** 1711
[M::ha_hist_line]   206: ** 1766
[M::ha_hist_line]   207: ** 1807
[M::ha_hist_line]   208: ** 1744
[M::ha_hist_line]   209: ** 2004
[M::ha_hist_line]   210: ** 2013
[M::ha_hist_line]   211: ** 1977
[M::ha_hist_line]   212: ** 2055
[M::ha_hist_line]   213: ** 2071
[M::ha_hist_line]   214: ** 1956
[M::ha_hist_line]   215: *** 2103
[M::ha_hist_line]   216: *** 2165
[M::ha_hist_line]   217: *** 2146
[M::ha_hist_line]   218: *** 2197
[M::ha_hist_line]   219: *** 2198
[M::ha_hist_line]   220: *** 2286
[M::ha_hist_line]   221: *** 2309
[M::ha_hist_line]   222: *** 2325
[M::ha_hist_line]   223: *** 2374
[M::ha_hist_line]   224: *** 2316
[M::ha_hist_line]   225: *** 2314
[M::ha_hist_line]   226: *** 2255
[M::ha_hist_line]   227: *** 2368
[M::ha_hist_line]   228: *** 2448
[M::ha_hist_line]   229: *** 2315
[M::ha_hist_line]   230: *** 2242
[M::ha_hist_line]   231: *** 2444
[M::ha_hist_line]   232: *** 2362
[M::ha_hist_line]   233: *** 2370
[M::ha_hist_line]   234: *** 2157
[M::ha_hist_line]   235: ** 2061
[M::ha_hist_line]   236: ** 1967
[M::ha_hist_line]   237: ** 1892
[M::ha_hist_line]   238: ** 1907
[M::ha_hist_line]   239: ** 1822
[M::ha_hist_line]   240: ** 1750
[M::ha_hist_line]   241: ** 1685
[M::ha_hist_line]   242: ** 1776
...ignored
xuzhougeng commented 10 months ago

After reading ha_analyze_count, I understood why I couldn't find the heterozygous peak.

According to ha_analyze_count, it searches backward from the maximum value and finds a position that is larger than both sides.

In this case, the presence of a local minor peak formed by 150, 151, and 152 just before 153 prevents the correct identification of the heterozygous peak.

chhylp123 commented 10 months ago

Thanks. We may try to fix this issue, but het_peak often does not affect the final results.

xuzhougeng commented 10 months ago

Thank you for your reply.