chhylp123 / hifiasm

Hifiasm: a haplotype-resolved assembler for accurate Hifi reads
MIT License
534 stars 87 forks source link

Suggestions on Large Genome Assembly #319

Open haiyun-fan opened 2 years ago

haiyun-fan commented 2 years ago

Dear authors, We have a genome about 5G, and the amount of HIFI data is about 134G. We want to get a better preliminary assembly result with hifiasm software with the default parameter: hifiasm - o xx - t 142 hifidata. But a week has passed, and the program has still been k-mer analysis. Do you have any optimization parameters for large genome? I look forward to your reply and help !!

Here are tail part information of log : [M::ha_hist_line] rest: ****> 69072 [M::ha_analyze_count] left: none [M::ha_analyze_count] right: none [M::ha_pt_gen] peak_hom: 180; peak_het: -1 [M::ha_pt_gen::12436.691*14.28] ==> indexed 3603072461 positions

chhylp123 commented 2 years ago

Could you please show the whole log file? Thanks a lot.

haiyun-fan commented 2 years ago

Thanks a lot ! Here are the log :

[M::ha_analyze_count] lowest: count[203] = 154269 [M::ha_analyze_count] highest: count[4095] = 807568 [M::ha_hist_line] 2: ****> 1021944427 [M::ha_hist_line] 3: ****> 354448672 [M::ha_hist_line] 4: ****> 189648304 [M::ha_hist_line] 5: ****> 126729238 [M::ha_hist_line] 6: ****> 97914893 [M::ha_hist_line] 7: ****> 83650926 [M::ha_hist_line] 8: ****> 75093363 [M::ha_hist_line] 9: ****> 69406939 [M::ha_hist_line] 10: ****> 65233416 [M::ha_hist_line] 11: ****> 61254129 [M::ha_hist_line] 12: ****> 58283811 [M::ha_hist_line] 13: ****> 55494917 [M::ha_hist_line] 14: ****> 53086730 [M::ha_hist_line] 15: ****> 51221542 [M::ha_hist_line] 16: ****> 49450933 [M::ha_hist_line] 17: ****> 47908648 [M::ha_hist_line] 18: ****> 46613079 [M::ha_hist_line] 19: ****> 45393525 [M::ha_hist_line] 20: ****> 44071534 [M::ha_hist_line] 21: ****> 42956681 [M::ha_hist_line] 22: ****> 41788201 [M::ha_hist_line] 23: ****> 40843446 [M::ha_hist_line] 24: ****> 39658823 ... [M::ha_hist_line] 4079: 462 [M::ha_hist_line] 4080: 437 [M::ha_hist_line] 4081: 491 [M::ha_hist_line] 4082: 445 [M::ha_hist_line] 4083: 440 [M::ha_hist_line] 4084: 454 [M::ha_hist_line] 4085: 472 [M::ha_hist_line] 4086: 449 [M::ha_hist_line] 4087: 469 [M::ha_hist_line] 4088: 450 [M::ha_hist_line] 4089: 434 [M::ha_hist_line] 4090: 457 [M::ha_hist_line] 4091: 439 [M::ha_hist_line] 4092: 440 [M::ha_hist_line] 4093: 433 [M::ha_hist_line] 4094: 468 [M::ha_hist_line] 4095: **** 807568 [M::ha_hist_line] rest: 0 [M::ha_analyze_count] left: count[204] = 154683 [M::ha_analyze_count] right: none [M::ha_ft_gen] peak_hom: 4095; peak_het: 204 [M::ha_ft_gen::9492.77511.11@81.697GB] ==> filtered out 808036 k-mers occurring 4094 or more times [M::ha_opt_update_cov] updated max_n_chain to 20475 [M::ha_pt_gen::11605.11512.16] ==> counted 521111167 distinct minimizer k-mers [M::ha_pt_gen] count[4095] = 0 (for sanity check) [M::ha_analyze_count] lowest: count[179] = 10422 [M::ha_analyze_count] highest: count[180] = 10540 [M::ha_hist_line] 1: ****> 365777518 [M::ha_hist_line] 2: ****> 39765949 [M::ha_hist_line] 3: ****> 14175841 [M::ha_hist_line] 4: ****> 7673936 [M::ha_hist_line] 5: ****> 5184283 [M::ha_hist_line] 6: ****> 4028496 ... [M::ha_hist_line] 1588: 54 [M::ha_hist_line] 1589: 80 [M::ha_hist_line] 1590: 81 [M::ha_hist_line] 1591: 69 [M::ha_hist_line] 1592: 81 [M::ha_hist_line] 1593: 64 [M::ha_hist_line] 1594: 83 [M::ha_hist_line] 1595: 91 [M::ha_hist_line] rest: ****> 69072 [M::ha_analyze_count] left: none [M::ha_analyze_count] right: none [M::ha_pt_gen] peak_hom: 180; peak_het: -1 [M::ha_pt_gen::12436.691*14.28] ==> indexed 3603072461 positions

haiyun-fan commented 2 years ago

I don't know how to handle this situation,any suggestions about it are very appreciated!

chhylp123 commented 2 years ago

Could you please upload the whole log file? One possibility is that the input HiFi reads are not such clean (see: https://hifiasm.readthedocs.io/en/latest/faq.html#why-does-hifiasm-stuck-or-crash). The k-mer plot outputted by hifiasm is able to help us with quick debugging.

haiyun-fan commented 2 years ago

Here are all the log :

log.txt

chhylp123 commented 2 years ago

The k-mer plot is weird, please see FAQ here: https://hifiasm.readthedocs.io/en/latest/faq.html#why-does-hifiasm-stuck-or-crash.