chhylp123 / hifiasm

Hifiasm: a haplotype-resolved assembler for accurate Hifi reads
MIT License
530 stars 87 forks source link

hi-c issue - hap1 is too large, hap2 too small #85

Open tcb72 opened 3 years ago

tcb72 commented 3 years ago

Hello,

Assembling a diploid algae, 200-210 Mbp (uncollapsed diploid size.) Relatively high heterozygosity estimated between 1.5%-2%. Quite complex with lots of large tandem repeats. We have approximately 18x (diploid coverage) HiFi data , and very high coverage Hi-C data.

Running hifiasm with parameters -l2 -k21 --high-het yields a primary contig asm of 111.225 Mbp in 81 contigs, and the alternate contig asm is 91.132 Mbp in 380 contigs. The primary is still a bit duplicated, but generally the results are reasonable and the duplicate regions can be seen visibly after running Juicer/3DDNA/Juicebox and removed.

I am excited to see Hi-C get implemented into hifiasm. We went ahead and tried it, but our results are a bit strange. Running the same exact parameters above but with the hic parameters added (including --enzyme GATC, which we weren't sure if necessary or not), we get a hap1 size of 198.769 Mbp in 174 contigs, hap2 size of 51.748 Mbp in 70 contigs, and r_utg size of 203.042 Mbp in 594 contigs. Obviously, the hap1 size is way too large, and the hap2 size is too small to be correct. Here's the log for that run:

[M::ha_analyze_count] lowest: count[5] = 88381
[M::ha_analyze_count] highest: count[20] = 3984195
[M::ha_hist_line]     2: ******************** 810800
[M::ha_hist_line]     3: **** 178491
[M::ha_hist_line]     4: *** 105666
[M::ha_hist_line]     5: ** 88381
[M::ha_hist_line]     6: *** 116581
[M::ha_hist_line]     7: **** 171392
[M::ha_hist_line]     8: ******* 264784
[M::ha_hist_line]     9: ********** 416418
[M::ha_hist_line]    10: **************** 618357
[M::ha_hist_line]    11: *********************** 903732
[M::ha_hist_line]    12: ******************************* 1238998
[M::ha_hist_line]    13: ****************************************** 1669148
[M::ha_hist_line]    14: ***************************************************** 2125684
[M::ha_hist_line]    15: ****************************************************************** 2620314
[M::ha_hist_line]    16: ***************************************************************************** 3084371
[M::ha_hist_line]    17: *************************************************************************************** 3480375
[M::ha_hist_line]    18: *********************************************************************************************** 3777559
[M::ha_hist_line]    19: **************************************************************************************************** 3981099
[M::ha_hist_line]    20: **************************************************************************************************** 3984195
[M::ha_hist_line]    21: ************************************************************************************************** 3900034
[M::ha_hist_line]    22: ******************************************************************************************** 3670163
[M::ha_hist_line]    23: ************************************************************************************ 3338962
[M::ha_hist_line]    24: *************************************************************************** 3004270
[M::ha_hist_line]    25: ****************************************************************** 2633350
[M::ha_hist_line]    26: ********************************************************** 2292121
[M::ha_hist_line]    27: ************************************************* 1967305
[M::ha_hist_line]    28: ******************************************* 1712520
[M::ha_hist_line]    29: ************************************** 1503004
[M::ha_hist_line]    30: ********************************** 1358797
[M::ha_hist_line]    31: ******************************** 1287401
[M::ha_hist_line]    32: ******************************* 1238839
[M::ha_hist_line]    33: ******************************* 1226816
[M::ha_hist_line]    34: ******************************** 1256275
[M::ha_hist_line]    35: ******************************** 1286056
[M::ha_hist_line]    36: ********************************* 1324396
[M::ha_hist_line]    37: ********************************** 1350643
[M::ha_hist_line]    38: *********************************** 1393192
[M::ha_hist_line]    39: ************************************ 1417390
[M::ha_hist_line]    40: *********************************** 1413525
[M::ha_hist_line]    41: *********************************** 1396914
[M::ha_hist_line]    42: ********************************** 1363428
[M::ha_hist_line]    43: ********************************* 1309081
[M::ha_hist_line]    44: ******************************* 1235301
[M::ha_hist_line]    45: ***************************** 1156834
[M::ha_hist_line]    46: *************************** 1076963
[M::ha_hist_line]    47: ************************* 977181
[M::ha_hist_line]    48: ********************** 882624
[M::ha_hist_line]    49: ******************** 787805
[M::ha_hist_line]    50: ****************** 708887
[M::ha_hist_line]    51: **************** 628788
[M::ha_hist_line]    52: ************** 554376
[M::ha_hist_line]    53: ************ 487113
[M::ha_hist_line]    54: *********** 437474
[M::ha_hist_line]    55: ********** 386778
[M::ha_hist_line]    56: ********* 346295
[M::ha_hist_line]    57: ******** 308258
[M::ha_hist_line]    58: ******* 279822
[M::ha_hist_line]    59: ****** 251586
[M::ha_hist_line]    60: ****** 232039
[M::ha_hist_line]    61: ***** 213679
[M::ha_hist_line]    62: ***** 199824
[M::ha_hist_line]    63: ***** 192996
[M::ha_hist_line]    64: ***** 180274
[M::ha_hist_line]    65: **** 169957
[M::ha_hist_line]    66: **** 161008
[M::ha_hist_line]    67: **** 153980
[M::ha_hist_line]    68: **** 145958
[M::ha_hist_line]    69: **** 140694
[M::ha_hist_line]    70: *** 134497
[M::ha_hist_line]    71: *** 127366
[M::ha_hist_line]    72: *** 122431
[M::ha_hist_line]    73: *** 117529
[M::ha_hist_line]    74: *** 113408
[M::ha_hist_line]    75: *** 108222
[M::ha_hist_line]    76: *** 104569
[M::ha_hist_line]    77: *** 100767
[M::ha_hist_line]    78: ** 96812
[M::ha_hist_line]    79: ** 93233
[M::ha_hist_line]    80: ** 91247
[M::ha_hist_line]    81: ** 88878
[M::ha_hist_line]    82: ** 85023
[M::ha_hist_line]    83: ** 82009
[M::ha_hist_line]    84: ** 79596
[M::ha_hist_line]    85: ** 77372
[M::ha_hist_line]    86: ** 73723
[M::ha_hist_line]    87: ** 70670
[M::ha_hist_line]    88: ** 68091
[M::ha_hist_line]    89: ** 65152
[M::ha_hist_line]    90: ** 63608
[M::ha_hist_line]    91: ** 61678
[M::ha_hist_line]    92: * 59627
[M::ha_hist_line]    93: * 56514
[M::ha_hist_line]    94: * 53221
[M::ha_hist_line]    95: * 52027
[M::ha_hist_line]    96: * 49814
[M::ha_hist_line]    97: * 48544
[M::ha_hist_line]    98: * 46759
[M::ha_hist_line]    99: * 44694
[M::ha_hist_line]   100: * 43418
[M::ha_hist_line]   101: * 41271
[M::ha_hist_line]   102: * 40089
[M::ha_hist_line]   103: * 39788
[M::ha_hist_line]   104: * 39024
[M::ha_hist_line]   105: * 37004
[M::ha_hist_line]   106: * 36596
[M::ha_hist_line]   107: * 35392
[M::ha_hist_line]   108: * 34125
[M::ha_hist_line]   109: * 33316
[M::ha_hist_line]   110: * 33083
[M::ha_hist_line]   111: * 31695
[M::ha_hist_line]   112: * 30534
[M::ha_hist_line]   113: * 29537
[M::ha_hist_line]   114: * 29147
[M::ha_hist_line]   115: * 27388
[M::ha_hist_line]   116: * 27312
[M::ha_hist_line]   117: * 26513
[M::ha_hist_line]   118: * 26069
[M::ha_hist_line]   119: * 25604
[M::ha_hist_line]   120: * 25129
[M::ha_hist_line]   121: * 25024
[M::ha_hist_line]   122: * 24560
[M::ha_hist_line]   123: * 23695
[M::ha_hist_line]   124: * 22825
[M::ha_hist_line]   125: * 21991
[M::ha_hist_line]   126: * 21795
[M::ha_hist_line]   127: * 21158
[M::ha_hist_line]   128: * 20178
[M::ha_hist_line]  rest: *************************************** 1569755
[M::ha_analyze_count] left: none
[M::ha_analyze_count] right: count[39] = 1417390
[M::ha_ft_gen] peak_hom: 39; peak_het: 20
[M::ha_ft_gen::388.382*5.54@17.649GB] ==> filtered out 836327 k-mers occurring 195 or more times
[M::ha_opt_update_cov] updated max_n_chain to 195
[M::ha_pt_gen::464.919*6.31] ==> counted 5919762 distinct minimizer k-mers
[M::ha_pt_gen] count[4095] = 0 (for sanity check)
[M::ha_analyze_count] lowest: count[5] = 6485
[M::ha_analyze_count] highest: count[19] = 218071
[M::ha_hist_line]     1: ****************************************************************************************************> 1442037
[M::ha_hist_line]     2: ****************************************** 91099
[M::ha_hist_line]     3: ********* 19933
[M::ha_hist_line]     4: **** 9230
[M::ha_hist_line]     5: *** 6485
[M::ha_hist_line]     6: **** 7907
[M::ha_hist_line]     7: ***** 10829
[M::ha_hist_line]     8: ******** 16448
[M::ha_hist_line]     9: ************ 25093
[M::ha_hist_line]    10: ***************** 37013
[M::ha_hist_line]    11: ************************* 53664
[M::ha_hist_line]    12: ********************************* 72972
[M::ha_hist_line]    13: ******************************************** 97001
[M::ha_hist_line]    14: ******************************************************** 122453
[M::ha_hist_line]    15: ********************************************************************* 149383
[M::ha_hist_line]    16: ******************************************************************************** 173706
[M::ha_hist_line]    17: ***************************************************************************************** 194125
[M::ha_hist_line]    18: ************************************************************************************************ 208300
[M::ha_hist_line]    19: **************************************************************************************************** 218071
[M::ha_hist_line]    20: *************************************************************************************************** 215245
[M::ha_hist_line]    21: ************************************************************************************************ 208264
[M::ha_hist_line]    22: ***************************************************************************************** 193946
[M::ha_hist_line]    23: ******************************************************************************** 175243
[M::ha_hist_line]    24: *********************************************************************** 155922
[M::ha_hist_line]    25: ************************************************************** 135735
[M::ha_hist_line]    26: ***************************************************** 116047
[M::ha_hist_line]    27: ********************************************** 99269
[M::ha_hist_line]    28: *************************************** 85292
[M::ha_hist_line]    29: ********************************** 74561
[M::ha_hist_line]    30: ******************************* 66689
[M::ha_hist_line]    31: **************************** 61917
[M::ha_hist_line]    32: *************************** 59141
[M::ha_hist_line]    33: *************************** 57866
[M::ha_hist_line]    34: *************************** 58206
[M::ha_hist_line]    35: *************************** 58955
[M::ha_hist_line]    36: **************************** 60489
[M::ha_hist_line]    37: **************************** 60821
[M::ha_hist_line]    38: **************************** 62122
[M::ha_hist_line]    39: ***************************** 62656
[M::ha_hist_line]    40: ***************************** 62608
[M::ha_hist_line]    41: **************************** 60804
[M::ha_hist_line]    42: *************************** 59416
[M::ha_hist_line]    43: ************************** 56095
[M::ha_hist_line]    44: ************************ 52198
[M::ha_hist_line]    45: *********************** 49524
[M::ha_hist_line]    46: ********************* 44865
[M::ha_hist_line]    47: ******************* 40740
[M::ha_hist_line]    48: ***************** 36496
[M::ha_hist_line]    49: *************** 32648
[M::ha_hist_line]    50: ************* 28940
[M::ha_hist_line]    51: ************ 25652
[M::ha_hist_line]    52: ********** 22607
[M::ha_hist_line]    53: ********* 19984
[M::ha_hist_line]    54: ******** 17821
[M::ha_hist_line]    55: ******* 15749
[M::ha_hist_line]    56: ****** 14020
[M::ha_hist_line]    57: ****** 12446
[M::ha_hist_line]    58: ***** 11199
[M::ha_hist_line]    59: ***** 10057
[M::ha_hist_line]    60: **** 9094
[M::ha_hist_line]    61: **** 8643
[M::ha_hist_line]    62: **** 8028
[M::ha_hist_line]    63: **** 7645
[M::ha_hist_line]    64: *** 7178
[M::ha_hist_line]    65: *** 6803
[M::ha_hist_line]    66: *** 6590
[M::ha_hist_line]    67: *** 6079
[M::ha_hist_line]    68: *** 5748
[M::ha_hist_line]    69: *** 5615
[M::ha_hist_line]    70: ** 5301
[M::ha_hist_line]    71: ** 5021
[M::ha_hist_line]    72: ** 4771
[M::ha_hist_line]    73: ** 4489
[M::ha_hist_line]    74: ** 4397
[M::ha_hist_line]    75: ** 4140
[M::ha_hist_line]    76: ** 3971
[M::ha_hist_line]    77: ** 3916
[M::ha_hist_line]    78: ** 3684
[M::ha_hist_line]    79: ** 3560
[M::ha_hist_line]    80: ** 3552
[M::ha_hist_line]    81: ** 3356
[M::ha_hist_line]    82: * 3209
[M::ha_hist_line]    83: * 3192
[M::ha_hist_line]    84: * 3037
[M::ha_hist_line]    85: * 2868
[M::ha_hist_line]    86: * 2822
[M::ha_hist_line]    87: * 2658
[M::ha_hist_line]    88: * 2423
[M::ha_hist_line]    89: * 2493
[M::ha_hist_line]    90: * 2509
[M::ha_hist_line]    91: * 2316
[M::ha_hist_line]    92: * 2269
[M::ha_hist_line]    93: * 2162
[M::ha_hist_line]    94: * 1929
[M::ha_hist_line]    95: * 1908
[M::ha_hist_line]    96: * 1812
[M::ha_hist_line]    97: * 1852
[M::ha_hist_line]    98: * 1737
[M::ha_hist_line]    99: * 1719
[M::ha_hist_line]   100: * 1688
[M::ha_hist_line]   101: * 1552
[M::ha_hist_line]   102: * 1491
[M::ha_hist_line]   103: * 1519
[M::ha_hist_line]   104: * 1450
[M::ha_hist_line]   105: * 1413
[M::ha_hist_line]   106: * 1404
[M::ha_hist_line]   107: * 1411
[M::ha_hist_line]   108: * 1217
[M::ha_hist_line]   109: * 1275
[M::ha_hist_line]   110: * 1189
[M::ha_hist_line]   111: * 1166
[M::ha_hist_line]   112: * 1128
[M::ha_hist_line]   113: * 1158
[M::ha_hist_line]   114: * 1145
[M::ha_hist_line]   115: * 1109
[M::ha_hist_line]  rest: **************** 35947
[M::ha_analyze_count] left: none
[M::ha_analyze_count] right: count[39] = 62656
[M::ha_pt_gen] peak_hom: 39; peak_het: 19
[M::ha_pt_gen::505.475*7.22] ==> indexed 128080142 positions
[M::ha_assemble::3186.794*22.57@17.649GB] ==> corrected reads for round 1
[M::ha_assemble] # bases: 4215442044; # corrected bases: 11635534; # recorrected bases: 18836
[M::ha_assemble] size of buffer: 9.166GB
[M::ha_pt_gen::3219.650*22.55] ==> counted 4659918 distinct minimizer k-mers
[M::ha_pt_gen] count[4095] = 0 (for sanity check)
[M::ha_analyze_count] lowest: count[5] = 4922
[M::ha_analyze_count] highest: count[19] = 215706
[M::ha_hist_line]     1: ****************************************************************************************************> 274622
[M::ha_hist_line]     2: ********* 19517
[M::ha_hist_line]     3: *** 6983
[M::ha_hist_line]     4: ** 5319
[M::ha_hist_line]     5: ** 4922
[M::ha_hist_line]     6: *** 6653
[M::ha_hist_line]     7: ***** 9922
[M::ha_hist_line]     8: ******* 15036
[M::ha_hist_line]     9: *********** 23757
[M::ha_hist_line]    10: **************** 34585
[M::ha_hist_line]    11: ************************ 51062
[M::ha_hist_line]    12: ******************************** 68896
[M::ha_hist_line]    13: ******************************************* 93191
[M::ha_hist_line]    14: ****************************************************** 117377
[M::ha_hist_line]    15: ******************************************************************* 143680
[M::ha_hist_line]    16: ****************************************************************************** 169033
[M::ha_hist_line]    17: **************************************************************************************** 189844
[M::ha_hist_line]    18: *********************************************************************************************** 205492
[M::ha_hist_line]    19: **************************************************************************************************** 215706
[M::ha_hist_line]    20: **************************************************************************************************** 214696
[M::ha_hist_line]    21: ************************************************************************************************* 209095
[M::ha_hist_line]    22: ******************************************************************************************* 196817
[M::ha_hist_line]    23: ********************************************************************************** 177812
[M::ha_hist_line]    24: ************************************************************************** 159100
[M::ha_hist_line]    25: **************************************************************** 138901
[M::ha_hist_line]    26: ******************************************************* 119331
[M::ha_hist_line]    27: *********************************************** 101962
[M::ha_hist_line]    28: **************************************** 87340
[M::ha_hist_line]    29: *********************************** 75992
[M::ha_hist_line]    30: ******************************* 67587
[M::ha_hist_line]    31: ***************************** 62121
[M::ha_hist_line]    32: *************************** 58820
[M::ha_hist_line]    33: *************************** 57272
[M::ha_hist_line]    34: *************************** 57505
[M::ha_hist_line]    35: *************************** 57802
[M::ha_hist_line]    36: **************************** 59541
[M::ha_hist_line]    37: **************************** 59971
[M::ha_hist_line]    38: **************************** 61255
[M::ha_hist_line]    39: ***************************** 62292
[M::ha_hist_line]    40: ***************************** 62582
[M::ha_hist_line]    41: **************************** 61073
[M::ha_hist_line]    42: **************************** 59654
[M::ha_hist_line]    43: *************************** 57330
[M::ha_hist_line]    44: ************************* 53319
[M::ha_hist_line]    45: ************************ 51034
[M::ha_hist_line]    46: ********************* 46349
[M::ha_hist_line]    47: ******************* 41984
[M::ha_hist_line]    48: ****************** 37985
[M::ha_hist_line]    49: **************** 34299
[M::ha_hist_line]    50: ************** 30318
[M::ha_hist_line]    51: ************ 26814
[M::ha_hist_line]    52: *********** 23743
[M::ha_hist_line]    53: ********** 20938
[M::ha_hist_line]    54: ********* 18714
[M::ha_hist_line]    55: ******** 16495
[M::ha_hist_line]    56: ******* 14732
[M::ha_hist_line]    57: ****** 13048
[M::ha_hist_line]    58: ***** 11827
[M::ha_hist_line]    59: ***** 10402
[M::ha_hist_line]    60: **** 9653
[M::ha_hist_line]    61: **** 8779
[M::ha_hist_line]    62: **** 8116
[M::ha_hist_line]    63: **** 7954
[M::ha_hist_line]    64: *** 7343
[M::ha_hist_line]    65: *** 6949
[M::ha_hist_line]    66: *** 6587
[M::ha_hist_line]    67: *** 6209
[M::ha_hist_line]    68: *** 6070
[M::ha_hist_line]    69: *** 5504
[M::ha_hist_line]    70: *** 5526
[M::ha_hist_line]    71: ** 5071
[M::ha_hist_line]    72: ** 4880
[M::ha_hist_line]    73: ** 4603
[M::ha_hist_line]    74: ** 4489
[M::ha_hist_line]    75: ** 4199
[M::ha_hist_line]    76: ** 4022
[M::ha_hist_line]    77: ** 3862
[M::ha_hist_line]    78: ** 3790
[M::ha_hist_line]    79: ** 3691
[M::ha_hist_line]    80: ** 3468
[M::ha_hist_line]    81: ** 3486
[M::ha_hist_line]    82: ** 3297
[M::ha_hist_line]    83: * 3172
[M::ha_hist_line]    84: * 3035
[M::ha_hist_line]    85: * 2958
[M::ha_hist_line]    86: * 2963
[M::ha_hist_line]    87: * 2717
[M::ha_hist_line]    88: * 2554
[M::ha_hist_line]    89: * 2526
[M::ha_hist_line]    90: * 2467
[M::ha_hist_line]    91: * 2338
[M::ha_hist_line]    92: * 2262
[M::ha_hist_line]    93: * 2201
[M::ha_hist_line]    94: * 2092
[M::ha_hist_line]    95: * 1991
[M::ha_hist_line]    96: * 1839
[M::ha_hist_line]    97: * 1825
[M::ha_hist_line]    98: * 1771
[M::ha_hist_line]    99: * 1696
[M::ha_hist_line]   100: * 1670
[M::ha_hist_line]   101: * 1566
[M::ha_hist_line]   102: * 1606
[M::ha_hist_line]   103: * 1487
[M::ha_hist_line]   104: * 1447
[M::ha_hist_line]   105: * 1417
[M::ha_hist_line]   106: * 1429
[M::ha_hist_line]   107: * 1427
[M::ha_hist_line]   108: * 1292
[M::ha_hist_line]   109: * 1246
[M::ha_hist_line]   110: * 1238
[M::ha_hist_line]   111: * 1195
[M::ha_hist_line]   112: * 1159
[M::ha_hist_line]   113: * 1161
[M::ha_hist_line]   114: * 1140
[M::ha_hist_line]  rest: ****************** 38034
[M::ha_analyze_count] left: none
[M::ha_analyze_count] right: count[40] = 62582
[M::ha_pt_gen] peak_hom: 40; peak_het: 19
[M::ha_pt_gen::3261.758*22.48] ==> indexed 129016067 positions
[M::ha_assemble::5820.750*23.65@20.097GB] ==> corrected reads for round 2
[M::ha_assemble] # bases: 4209292072; # corrected bases: 203468; # recorrected bases: 4291
[M::ha_assemble] size of buffer: 9.011GB
[M::ha_pt_gen::5855.157*23.62] ==> counted 4641484 distinct minimizer k-mers
[M::ha_pt_gen] count[4095] = 0 (for sanity check)
[M::ha_analyze_count] lowest: count[5] = 4836
[M::ha_analyze_count] highest: count[19] = 215760
[M::ha_hist_line]     1: ****************************************************************************************************> 258596
[M::ha_hist_line]     2: ******** 17808
[M::ha_hist_line]     3: *** 6505
[M::ha_hist_line]     4: ** 5217
[M::ha_hist_line]     5: ** 4836
[M::ha_hist_line]     6: *** 6660
[M::ha_hist_line]     7: ***** 9902
[M::ha_hist_line]     8: ******* 15012
[M::ha_hist_line]     9: *********** 23774
[M::ha_hist_line]    10: **************** 34502
[M::ha_hist_line]    11: ************************ 51030
[M::ha_hist_line]    12: ******************************** 68929
[M::ha_hist_line]    13: ******************************************* 93050
[M::ha_hist_line]    14: ****************************************************** 117344
[M::ha_hist_line]    15: ******************************************************************* 143736
[M::ha_hist_line]    16: ****************************************************************************** 168964
[M::ha_hist_line]    17: **************************************************************************************** 189785
[M::ha_hist_line]    18: *********************************************************************************************** 205184
[M::ha_hist_line]    19: **************************************************************************************************** 215760
[M::ha_hist_line]    20: **************************************************************************************************** 214745
[M::ha_hist_line]    21: ************************************************************************************************* 209088
[M::ha_hist_line]    22: ******************************************************************************************* 196864
[M::ha_hist_line]    23: ********************************************************************************** 177823
[M::ha_hist_line]    24: ************************************************************************** 159232
[M::ha_hist_line]    25: **************************************************************** 138844
[M::ha_hist_line]    26: ******************************************************* 119384
[M::ha_hist_line]    27: *********************************************** 101923
[M::ha_hist_line]    28: **************************************** 87350
[M::ha_hist_line]    29: *********************************** 76036
[M::ha_hist_line]    30: ******************************* 67670
[M::ha_hist_line]    31: ***************************** 62104
[M::ha_hist_line]    32: *************************** 58771
[M::ha_hist_line]    33: *************************** 57317
[M::ha_hist_line]    34: *************************** 57510
[M::ha_hist_line]    35: *************************** 57732
[M::ha_hist_line]    36: **************************** 59586
[M::ha_hist_line]    37: **************************** 59931
[M::ha_hist_line]    38: **************************** 61284
[M::ha_hist_line]    39: ***************************** 62376
[M::ha_hist_line]    40: ***************************** 62531
[M::ha_hist_line]    41: **************************** 61077
[M::ha_hist_line]    42: **************************** 59677
[M::ha_hist_line]    43: *************************** 57360
[M::ha_hist_line]    44: ************************* 53319
[M::ha_hist_line]    45: ************************ 51029
[M::ha_hist_line]    46: ********************** 46410
[M::ha_hist_line]    47: ******************* 41984
[M::ha_hist_line]    48: ****************** 37998
[M::ha_hist_line]    49: **************** 34279
[M::ha_hist_line]    50: ************** 30322
[M::ha_hist_line]    51: ************ 26873
[M::ha_hist_line]    52: *********** 23719
[M::ha_hist_line]    53: ********** 20968
[M::ha_hist_line]    54: ********* 18731
[M::ha_hist_line]    55: ******** 16498
[M::ha_hist_line]    56: ******* 14735
[M::ha_hist_line]    57: ****** 13048
[M::ha_hist_line]    58: ***** 11853
[M::ha_hist_line]    59: ***** 10396
[M::ha_hist_line]    60: **** 9665
[M::ha_hist_line]    61: **** 8771
[M::ha_hist_line]    62: **** 8108
[M::ha_hist_line]    63: **** 7967
[M::ha_hist_line]    64: *** 7341
[M::ha_hist_line]    65: *** 6937
[M::ha_hist_line]    66: *** 6609
[M::ha_hist_line]    67: *** 6207
[M::ha_hist_line]    68: *** 6073
[M::ha_hist_line]    69: *** 5499
[M::ha_hist_line]    70: *** 5523
[M::ha_hist_line]    71: ** 5071
[M::ha_hist_line]    72: ** 4865
[M::ha_hist_line]    73: ** 4615
[M::ha_hist_line]    74: ** 4476
[M::ha_hist_line]    75: ** 4227
[M::ha_hist_line]    76: ** 4021
[M::ha_hist_line]    77: ** 3861
[M::ha_hist_line]    78: ** 3780
[M::ha_hist_line]    79: ** 3679
[M::ha_hist_line]    80: ** 3467
[M::ha_hist_line]    81: ** 3510
[M::ha_hist_line]    82: ** 3281
[M::ha_hist_line]    83: * 3171
[M::ha_hist_line]    84: * 3050
[M::ha_hist_line]    85: * 2952
[M::ha_hist_line]    86: * 2942
[M::ha_hist_line]    87: * 2752
[M::ha_hist_line]    88: * 2546
[M::ha_hist_line]    89: * 2536
[M::ha_hist_line]    90: * 2457
[M::ha_hist_line]    91: * 2367
[M::ha_hist_line]    92: * 2242
[M::ha_hist_line]    93: * 2186
[M::ha_hist_line]    94: * 2119
[M::ha_hist_line]    95: * 1980
[M::ha_hist_line]    96: * 1838
[M::ha_hist_line]    97: * 1831
[M::ha_hist_line]    98: * 1774
[M::ha_hist_line]    99: * 1675
[M::ha_hist_line]   100: * 1689
[M::ha_hist_line]   101: * 1571
[M::ha_hist_line]   102: * 1593
[M::ha_hist_line]   103: * 1494
[M::ha_hist_line]   104: * 1442
[M::ha_hist_line]   105: * 1416
[M::ha_hist_line]   106: * 1435
[M::ha_hist_line]   107: * 1423
[M::ha_hist_line]   108: * 1291
[M::ha_hist_line]   109: * 1240
[M::ha_hist_line]   110: * 1233
[M::ha_hist_line]   111: * 1202
[M::ha_hist_line]   112: * 1163
[M::ha_hist_line]   113: * 1168
[M::ha_hist_line]   114: * 1142
[M::ha_hist_line]  rest: ****************** 38040
[M::ha_analyze_count] left: none
[M::ha_analyze_count] right: count[40] = 62531
[M::ha_pt_gen] peak_hom: 40; peak_het: 19
[M::ha_pt_gen::5896.456*23.58] ==> indexed 129024514 positions
[M::ha_assemble::8409.835*24.14@20.984GB] ==> corrected reads for round 3
[M::ha_assemble] # bases: 4209179025; # corrected bases: 37046; # recorrected bases: 2683
[M::ha_assemble] size of buffer: 8.987GB
[M::ha_pt_gen::8444.013*24.13] ==> counted 4639277 distinct minimizer k-mers
[M::ha_pt_gen] count[4095] = 0 (for sanity check)
[M::ha_analyze_count] lowest: count[5] = 4824
[M::ha_analyze_count] highest: count[19] = 215710
[M::ha_hist_line]     1: ****************************************************************************************************> 256638
[M::ha_hist_line]     2: ******** 17715
[M::ha_hist_line]     3: *** 6388
[M::ha_hist_line]     4: ** 5193
[M::ha_hist_line]     5: ** 4824
[M::ha_hist_line]     6: *** 6640
[M::ha_hist_line]     7: ***** 9913
[M::ha_hist_line]     8: ******* 14998
[M::ha_hist_line]     9: *********** 23794
[M::ha_hist_line]    10: **************** 34523
[M::ha_hist_line]    11: ************************ 51014
[M::ha_hist_line]    12: ******************************** 68868
[M::ha_hist_line]    13: ******************************************* 93048
[M::ha_hist_line]    14: ****************************************************** 117393
[M::ha_hist_line]    15: ******************************************************************* 143778
[M::ha_hist_line]    16: ****************************************************************************** 168918
[M::ha_hist_line]    17: **************************************************************************************** 189687
[M::ha_hist_line]    18: *********************************************************************************************** 205196
[M::ha_hist_line]    19: **************************************************************************************************** 215710
[M::ha_hist_line]    20: **************************************************************************************************** 214818
[M::ha_hist_line]    21: ************************************************************************************************* 209166
[M::ha_hist_line]    22: ******************************************************************************************* 196811
[M::ha_hist_line]    23: ********************************************************************************** 177809
[M::ha_hist_line]    24: ************************************************************************** 159341
[M::ha_hist_line]    25: **************************************************************** 138848
[M::ha_hist_line]    26: ******************************************************* 119292
[M::ha_hist_line]    27: *********************************************** 101896
[M::ha_hist_line]    28: **************************************** 87358
[M::ha_hist_line]    29: *********************************** 76038
[M::ha_hist_line]    30: ******************************* 67651
[M::ha_hist_line]    31: ***************************** 62152
[M::ha_hist_line]    32: *************************** 58770
[M::ha_hist_line]    33: *************************** 57329
[M::ha_hist_line]    34: *************************** 57519
[M::ha_hist_line]    35: *************************** 57737
[M::ha_hist_line]    36: **************************** 59564
[M::ha_hist_line]    37: **************************** 59935
[M::ha_hist_line]    38: **************************** 61263
[M::ha_hist_line]    39: ***************************** 62380
[M::ha_hist_line]    40: ***************************** 62551
[M::ha_hist_line]    41: **************************** 61080
[M::ha_hist_line]    42: **************************** 59685
[M::ha_hist_line]    43: *************************** 57352
[M::ha_hist_line]    44: ************************* 53315
[M::ha_hist_line]    45: ************************ 51040
[M::ha_hist_line]    46: ********************** 46409
[M::ha_hist_line]    47: ******************* 41992
[M::ha_hist_line]    48: ****************** 37991
[M::ha_hist_line]    49: **************** 34270
[M::ha_hist_line]    50: ************** 30338
[M::ha_hist_line]    51: ************ 26866
[M::ha_hist_line]    52: *********** 23725
[M::ha_hist_line]    53: ********** 20954
[M::ha_hist_line]    54: ********* 18742
[M::ha_hist_line]    55: ******** 16498
[M::ha_hist_line]    56: ******* 14732
[M::ha_hist_line]    57: ****** 13047
[M::ha_hist_line]    58: ***** 11860
[M::ha_hist_line]    59: ***** 10400
[M::ha_hist_line]    60: **** 9674
[M::ha_hist_line]    61: **** 8760
[M::ha_hist_line]    62: **** 8101
[M::ha_hist_line]    63: **** 7969
[M::ha_hist_line]    64: *** 7340
[M::ha_hist_line]    65: *** 6942
[M::ha_hist_line]    66: *** 6607
[M::ha_hist_line]    67: *** 6209
[M::ha_hist_line]    68: *** 6066
[M::ha_hist_line]    69: *** 5513
[M::ha_hist_line]    70: *** 5521
[M::ha_hist_line]    71: ** 5065
[M::ha_hist_line]    72: ** 4864
[M::ha_hist_line]    73: ** 4619
[M::ha_hist_line]    74: ** 4482
[M::ha_hist_line]    75: ** 4227
[M::ha_hist_line]    76: ** 4012
[M::ha_hist_line]    77: ** 3871
[M::ha_hist_line]    78: ** 3773
[M::ha_hist_line]    79: ** 3683
[M::ha_hist_line]    80: ** 3458
[M::ha_hist_line]    81: ** 3503
[M::ha_hist_line]    82: ** 3282
[M::ha_hist_line]    83: * 3180
[M::ha_hist_line]    84: * 3054
[M::ha_hist_line]    85: * 2959
[M::ha_hist_line]    86: * 2941
[M::ha_hist_line]    87: * 2750
[M::ha_hist_line]    88: * 2542
[M::ha_hist_line]    89: * 2531
[M::ha_hist_line]    90: * 2451
[M::ha_hist_line]    91: * 2380
[M::ha_hist_line]    92: * 2240
[M::ha_hist_line]    93: * 2188
[M::ha_hist_line]    94: * 2112
[M::ha_hist_line]    95: * 1981
[M::ha_hist_line]    96: * 1836
[M::ha_hist_line]    97: * 1835
[M::ha_hist_line]    98: * 1777
[M::ha_hist_line]    99: * 1671
[M::ha_hist_line]   100: * 1695
[M::ha_hist_line]   101: * 1572
[M::ha_hist_line]   102: * 1590
[M::ha_hist_line]   103: * 1493
[M::ha_hist_line]   104: * 1449
[M::ha_hist_line]   105: * 1412
[M::ha_hist_line]   106: * 1429
[M::ha_hist_line]   107: * 1418
[M::ha_hist_line]   108: * 1293
[M::ha_hist_line]   109: * 1250
[M::ha_hist_line]   110: * 1224
[M::ha_hist_line]   111: * 1201
[M::ha_hist_line]   112: * 1164
[M::ha_hist_line]   113: * 1176
[M::ha_hist_line]   114: * 1136
[M::ha_hist_line]  rest: ****************** 38044
[M::ha_analyze_count] left: none
[M::ha_analyze_count] right: count[40] = 62551
[M::ha_pt_gen] peak_hom: 40; peak_het: 19
[M::ha_pt_gen::8485.939*24.10] ==> indexed 129025443 positions
[M::ha_assemble::14947.321*25.24@21.230GB] ==> found overlaps for the final round
[M::ha_print_ovlp_stat] # overlaps: 20278576
[M::ha_print_ovlp_stat] # strong overlaps: 8805482
[M::ha_print_ovlp_stat] # weak overlaps: 11473094
[M::ha_print_ovlp_stat] # exact overlaps: 20038202
[M::ha_print_ovlp_stat] # inexact overlaps: 240374
[M::ha_print_ovlp_stat] # overlaps without large indels: 20274325
[M::ha_print_ovlp_stat] # reverse overlaps: 9872588
Writing reads to disk...
Reads has been written.
Writing ma_hit_ts to disk...
ma_hit_ts has been written.
Writing ma_hit_ts to disk...
ma_hit_ts has been written.
bin files have been written.
[M::purge_dups] purge duplication coverage threshold: 50
[M::purge_dups] purge duplication coverage threshold: 50
[M::adjust_utg_by_primary] primary contig coverage range: [32, infinity]
[M::purge_dups] purge duplication coverage threshold: 50
[M::purge_dups] purge duplication coverage threshold: 50
[M::purge_dups] purge duplication coverage threshold: 50
[M::purge_dups] purge duplication coverage threshold: 50
[M::purge_dups] purge duplication coverage threshold: 50
[M::adjust_utg_by_primary] primary contig coverage range: [32, infinity]
[M::purge_dups] purge duplication coverage threshold: 50
[M::build_unitig_index::18.143] ==> Counting
[M::build_unitig_index::7.263] ==> Memory allocating
[M::build_unitig_index::20.585] ==> Filling pos
[M::build_unitig_index::0.281] ==> Sorting pos
[M::build_unitig_index::46.277] ==> HiC index has been built
[M::write_hc_pt_index] Index has been written.
[M::dedup_hits::11.850] ==> Dedup
[M::all_pair_shortest_path::0.003]
[M::fill_utg_distance_multi::0.074]
[M::collect_hc_links::12.811] ==> Hi-C linkages have been counted
[M::link_phase_group::0.000]
[M::fill_utg_distance_multi::0.002]
[M::link_phase_group::0.000]
Writing raw unitig GFA to disk...
Writing scenedesmus_obliquus_hic_l2_k21_high_het_with_enzyme.hic.hap1.p_ctg.gfa to disk...
[M::purge_dups] purge duplication coverage threshold: 50
[M::adjust_utg_by_trio] primary contig coverage range: [32, infinity]
Writing scenedesmus_obliquus_hic_l2_k21_high_het_with_enzyme.hic.hap2.p_ctg.gfa to disk...
[M::purge_dups] purge duplication coverage threshold: 50
[M::adjust_utg_by_trio] primary contig coverage range: [32, infinity]
Inconsistency threshold for low-quality regions in BED files: 70%
[M::main] Version: 0.14-r312
[M::main] CMD: hifiasm -o scenedesmus_obliquus_hic_l2_k21_high_het_with_enzyme -l2 --high-het -k 21 -t32 --h1 new_hic_data/s_obliquus_S3HiC_R1_clean.fastq.gz --h2 new_hic_data/s_obliquus_S3HiC_R2_clean.fastq.gz --enzyme GATC hifi_reads/combined_reads.fasta
[M::main] Real time: 16081.748 sec; CPU: 383060.664 sec; Peak RSS: 26.309 GB

Other runs we did:

hiç params, --enzyme GATC, k21, no high het: hap1 size of 192.982 Mbp, hap2 size of 85.366 Mbp hic params, --enzyme GATC, k23, no high het: hap1 size of 194.981, hap2 size of 76.830 Mbp

Let me know if I can provide any more information to help you out. Other than that, I appreciate the development of hifiasm -- it's been fantastic for diploid genomes.

Best,

Tom

chhylp123 commented 3 years ago

For hifiasm, there are two known issues: 1) purge_dups may not be able to do sufficient purging, 2) hic phasing is unbalance. I have already fixed issue 1) and still tuning our algorithm for hic phasing. Hopefully I can fix issue 2) soon. If you think 1) is important, I can also push it to github repo right now.

andyjslee commented 3 years ago

I am also using the Hi-C mode along with HiFi sequencing data. I am getting segmentation fault error. I initially thought that the error had to do with the unusual memory consumption, but could my error be related to the issue above? Lastly, there are no specific logs produced by hifiasm for further details on the segmentation fault.

tcb72 commented 3 years ago

For hifiasm, there are two known issues: 1) purge_dups may not be able to do sufficient purging, 2) hic phasing is unbalance. I have already fixed issue 1) and still tuning our algorithm for hic phasing. Hopefully I can fix issue 2) soon. If you think 1) is important, I can also push it to github repo right now.

That would be helpful, appreciate it.

xinghua1001 commented 3 years ago

For hifiasm, there are two known issues: 1) purge_dups may not be able to do sufficient purging, 2) hic phasing is unbalance. I have already fixed issue 1) and still tuning our algorithm for hic phasing. Hopefully I can fix issue 2) soon. If you think 1) is important, I can also push it to github repo right now.

Hi Haoyu, I also found the hifiasm purging is not sufficient. If I use the standalone purge_dups to purge the hifiasm primary assembly, many small contigs are purged and the N50 imporoved. Could you please update it in github or send it to xinghuali94@qq.com directly. I'm in a bit hurry to use it. Many thanks!

chhylp123 commented 3 years ago

I will update the new version in a few hours, sorry for the delay.

chhylp123 commented 3 years ago

@tcb72 @xinghua1001 Please have a try using the option '-l3' with github HEAD (0.14-r313). '-s' is able to further adjust the results. Note that the option '--high-het' is removed for now, since ordinary bin files without '--high-het' + '-l3' already works on my side. If you have bin files generated by '--high-het', '-l3' may also work. Hope '-l3' can fix the purging problems for your samples.

tcb72 commented 3 years ago

@chhylp123 Thanks! Running now from scratch w/o HiC parameters -- will let you know in few hours (so weird to say "few hours"... HiFi reads are the best.)

We got some new HiFi data in so now we have approximately 35x coverage per haplotype. Running same parameters as last time (-l2, --high-het), I got an assembly size of 120.997 Mbp in 93 scaffolds, and Hi-C revealed multiple massive misjoins (see below... that HiC matrix was produced using hifiasm p_ctg + purge dups, which reduced the assembly to 110 Mbp in 36 scaffolds, and still has clear duplication even after purge dups) So I'll compare the -l3 assembly to the above.

Screen Shot 2021-03-18 at 6 51 37 AM

Should I test with Hi-C too, or this commit won't make a difference with Hi-C data?

Best,

Tom

tcb72 commented 3 years ago

primary contig stats using -l3:

Total scaffolds/contigs: 96 Size: 104.984 Mbp Max scaffold: 16.952 Mbp N50: irrelevant right now because of misjoins

Looks like it got rid of a lot of the duplication but still some large misjoins. I know I can fix them in Juicebox but was wondering why this could be happening/parameters I can tweak to fix?

chhylp123 commented 3 years ago

There are three parameters that might be helpful: --b-cov, --h-cov, --m-rate. These three parameters break contigs at potential misassemblies. But I guess manually break contigs by HiC would be more accurate since there are not too many misassemblies from Hi-C heatmap.

By the way, is it possible that you can share the data with us? I'm also confused why sometimes hifiasm introduces misassemblies. Thank you in advance.

chhylp123 commented 3 years ago

Maybe not. Anyway we will release a new version with updated Hi-C module soon, please wait me a few days.

I am also using the Hi-C mode along with HiFi sequencing data. I am getting segmentation fault error. I initially thought that the error had to do with the unusual memory consumption, but could my error be related to the issue above? Lastly, there are no specific logs produced by hifiasm for further details on the segmentation fault.

shilpagarg commented 3 years ago

Alternatively, you may check out https://github.com/shilpagarg/DipAsm/issues/16, plus apply standalone purge_dups on pstools phased scaffolds. In our experiments, there should not be any issue of mis-joins.

tcb72 commented 3 years ago

@shilpagarg I've checked out DipAsm/pstools before, unfortunately cannot install it bc our cluster currently doesn't support Docker containers. However, I asked them to install DeepVariant for a different project, and a lot of the other informaticians here want Docker support too, so they're trying to implement it. Hopefully I can try it out soon. My only concern is DeepVariant's performance on non-human samples -- any ideas?

lh3 commented 3 years ago

@tcb72 Do you have a screenshot of the r_utg graph in Bandage? What the unitig N50 of unitigs in the r_utg graph?

tcb72 commented 3 years ago

@lh3 hifiasm_l3 r_utg

N50 of r_utg file is 1.605 Mbp with max scaffold of 5.515 Mbp.

lh3 commented 3 years ago

Thanks. Haplotypes have mostly been separated at the unitig level. Hi-C should work but we need more time to improve for non-human species.