chhylp123 / hifiasm

Hifiasm: a haplotype-resolved assembler for accurate Hifi reads
MIT License
526 stars 86 forks source link

Read error correction does not reduce the number of kmers present once, twice or three times #652

Open chklopp opened 4 months ago

chklopp commented 4 months ago

I try to assemble herro error corrected reads with hifiasm 0.19.8

But the number of kmers seen a low number of times does not decrease as expected

Initial histogram in the log

[M::ha_hist_line]     2: ****************************************************************************************************> 107716610
[M::ha_hist_line]     3: ****************************************************************************************************> 30996780
[M::ha_hist_line]     4: ************************************************************** 15593515
[M::ha_hist_line]     5: ********************************************* 11280672
[M::ha_hist_line]     6: ****************************************** 10378745
[M::ha_hist_line]     7: ******************************************** 10969244
[M::ha_hist_line]     8: ************************************************** 12440832
[M::ha_hist_line]     9: ********************************************************* 14313356
[M::ha_hist_line]    10: ****************************************************************** 16548878
[M::ha_hist_line]    11: *************************************************************************** 18834001
[M::ha_hist_line]    12: ************************************************************************************ 20983530
[M::ha_hist_line]    13: ******************************************************************************************* 22728212
[M::ha_hist_line]    14: ************************************************************************************************ 24067655
[M::ha_hist_line]    15: **************************************************************************************************** 24853609
[M::ha_hist_line]    16: **************************************************************************************************** 24957299
[M::ha_hist_line]    17: *************************************************************************************************** 24619025
[M::ha_hist_line]    18: *********************************************************************************************** 23715443
[M::ha_hist_line]    19: ****************************************************************************************** 22573874
[M::ha_hist_line]    20: ************************************************************************************* 21273382
[M::ha_hist_line]    21: ******************************************************************************** 19908321
[M::ha_hist_line]    22: *************************************************************************** 18789819
[M::ha_hist_line]    23: ************************************************************************ 18034450
[M::ha_hist_line]    24: ********************************************************************** 17581256
[M::ha_hist_line]    25: ********************************************************************** 17553086
[M::ha_hist_line]    26: ************************************************************************ 17887681
[M::ha_hist_line]    27: ************************************************************************** 18591686
[M::ha_hist_line]    28: ****************************************************************************** 19396013
[M::ha_hist_line]    29: ********************************************************************************* 20301932
[M::ha_hist_line]    30: ************************************************************************************ 21087688
[M::ha_hist_line]    31: *************************************************************************************** 21817734
[M::ha_hist_line]    32: ***************************************************************************************** 22298434
[M::ha_hist_line]    33: ****************************************************************************************** 22557746
[M::ha_hist_line]    34: ****************************************************************************************** 22440107
[M::ha_hist_line]    35: ***************************************************************************************** 22130525
[M::ha_hist_line]    36: ************************************************************************************** 21459325
[M::ha_hist_line]    37: ********************************************************************************** 20509144
[M::ha_hist_line]    38: ****************************************************************************** 19399337
[M::ha_hist_line]    39: ************************************************************************ 17962454
[M::ha_hist_line]    40: ***************************************************************** 16334878
[M::ha_hist_line]    41: *********************************************************** 14619679
[M::ha_hist_line]    42: *************************************************** 12793736
[M::ha_hist_line]    43: ******************************************** 11035127
[M::ha_hist_line]    44: ************************************** 9361763
[M::ha_hist_line]    45: ******************************* 7757802
[M::ha_hist_line]    46: ************************** 6387930
[M::ha_hist_line]    47: ********************* 5197173
[M::ha_hist_line]    48: ***************** 4122166

2nd histogram

M::ha_hist_line]     1: ****************************************************************************************************> 79362122
[M::ha_hist_line]     2: ****************************************************************************************************> 5824705
[M::ha_hist_line]     3: ****************************************************************************************************> 1842811
[M::ha_hist_line]     4: **************************************************************************************** 934285
[M::ha_hist_line]     5: ************************************************************* 646881
[M::ha_hist_line]     6: ***************************************************** 561993
[M::ha_hist_line]     7: ***************************************************** 563891
[M::ha_hist_line]     8: ********************************************************** 610696
[M::ha_hist_line]     9: **************************************************************** 67923

Third histogram

[M::ha_hist_line]     1: ****************************************************************************************************> 65809002
[M::ha_hist_line]     2: ****************************************************************************************************> 4505333
[M::ha_hist_line]     3: ****************************************************************************************************> 1459641
[M::ha_hist_line]     4: *************************************************************************** 774497
[M::ha_hist_line]     5: ******************************************************* 566043
[M::ha_hist_line]     6: ************************************************* 508811
[M::ha_hist_line]     7: *************************************************** 523826
[M::ha_hist_line]     8: ******************************************************** 572254

Fourth histogram

[M::ha_hist_line]     1: ****************************************************************************************************> 56509704
[M::ha_hist_line]     2: ****************************************************************************************************> 3725834
[M::ha_hist_line]     3: ****************************************************************************************************> 1243762
[M::ha_hist_line]     4: ******************************************************************** 688236
[M::ha_hist_line]     5: **************************************************** 520982
[M::ha_hist_line]     6: *********************************************** 479553
[M::ha_hist_line]     7: ************************************************** 500632
[M::ha_hist_line]     8: ****************************************************** 549716

Fifth histogram

[M::ha_hist_line]     1: ****************************************************************************************************> 50554387
[M::ha_hist_line]     2: ****************************************************************************************************> 3283946
[M::ha_hist_line]     3: ****************************************************************************************************> 1127797
[M::ha_hist_line]     4: **************************************************************** 642271
[M::ha_hist_line]     5: ************************************************** 496973
[M::ha_hist_line]     6: ********************************************** 463586
[M::ha_hist_line]     7: ************************************************* 486543
[M::ha_hist_line]     8: ****************************************************** 534726

The resulting assemlby metrics are low = small split assembly. The coverage given in the gfa files are very low.

With hifi reads the last histogram only has very few kmer seens once left. What parameter could I tweak to improve this?

tallnuttrbgv commented 1 month ago

I'm a fellow user so don't have a definitive answer. Do you have before and after HERRO? I assume the histograms are all after HERRO? Were they HiFi reads or HiFi and HERRO ONT reads? If they were HiFi then perhaps HERRO did not make any improvement and they were already very accurate. I suggest looking at the read mapping for answers - where errors can generally be clearly seen.