chhylp123 / hifiasm

Hifiasm: a haplotype-resolved assembler for accurate Hifi reads
MIT License
533 stars 87 forks source link

ERROR-r-break ERROR-read #395

Open JanMiao opened 1 year ago

JanMiao commented 1 year ago

Hi, I am currently working on integrating HiC data and running HiFiasm, but I encountered an error "ERROR-r-break". I am not sure what this error means or under what circumstances it occurs. Attached below is the log of my run。

[M::ha_hist_line]    73: * 15731
[M::ha_hist_line]  rest: ***************** 512891
[M::ha_analyze_count] left: none
[M::ha_analyze_count] right: count[27] = 3041502
[M::ha_pt_gen] peak_hom: 27; peak_het: 14
[M::ha_ct_shrink::88974.195*23.68] ==> counted 80358263 distinct minimizer k-mers
[M::ha_pt_gen::] counting in normal mode
[M::yak_count] collected 1837050717 minimizers
[M::ha_pt_gen::89778.069*23.61] ==> indexed 1836232655 positions, counted 80358263 distinct minimizer k-mers
[M::ha_assemble::92769.494*23.82@91.066GB] ==> found overlaps for the final round
[M::ha_print_ovlp_stat] # overlaps: 115252214
[M::ha_print_ovlp_stat] # strong overlaps: 71954654
[M::ha_print_ovlp_stat] # weak overlaps: 43297560
[M::ha_print_ovlp_stat] # exact overlaps: 110551400
[M::ha_print_ovlp_stat] # inexact overlaps: 4700814
[M::ha_print_ovlp_stat] # overlaps without large indels: 114918503
[M::ha_print_ovlp_stat] # reverse overlaps: 71273517
Writing reads to disk... 
Reads has been written.
Writing ma_hit_ts to disk... 
ma_hit_ts has been written.
Writing ma_hit_ts to disk... 
ma_hit_ts has been written.
bin files have been written.
[M::purge_dups] homozygous read coverage threshold: 26
[M::purge_dups] purge duplication coverage threshold: 33
Writing raw unitig GFA to disk... 
Writing processed unitig GFA to disk... 
[M::purge_dups] homozygous read coverage threshold: 26
[M::purge_dups] purge duplication coverage threshold: 33
[M::mc_solve_core::1.952] ==> Partition
[M::adjust_utg_by_primary] primary contig coverage range: [22, infinity]
Writing DH.asm.hic.p_ctg.gfa to disk... 
[M::build_unitig_index::442.824] ==> Counting
[M::build_unitig_index::205.189] ==> Memory allocating
[M::build_unitig_index::503.230] ==> Filling pos
[M::build_unitig_index::4.525] ==> Sorting pos
[M::build_unitig_index::1155.775] ==> HiC index has been built
[M::write_hc_pt_index] Index has been written.
[M::alignment_worker_pipeline::8291.876] ==> Qualification
[M::dedup_hits::88.658] ==> Dedup
[M::mc_solve_core::1.522] ==> Partition
ERROR-r-break
ERROR-read
[M::dedup_hits::71.809] ==> Dedup
[M::stat] # misjoined unitigs: 165 (N50: 1536059); # corrected unitigs: 330 (N50: 978841)
[M::adjust_weight_kv_u_trans_advance::358.181] 
[M::mb_solve_core::479.010] ==> Partition
[M::mc_solve_core::454.795] ==> Partition
[M::adjust_weight_kv_u_trans_advance::1945.818] 
[M::mb_solve_core::599.984] ==> Partition
[M::mc_solve_core::464.673] ==> Partition
[M::adjust_weight_kv_u_trans_advance::1877.591] 
[M::mb_solve_core::643.440] ==> Partition
[M::mc_solve_core::535.942] ==> Partition
[M::stat] # heterozygous bases: 4545533000; # homozygous bases: 417492582
[M::reduce_hamming_error::4.073] # inserted edges: 12202, # fixed bubbles: 107
[M::adjust_utg_by_trio] primary contig coverage range: [22, infinity]
Writing DH.asm.hic.hap1.p_ctg.gfa to disk... 
[M::adjust_utg_by_trio] primary contig coverage range: [22, infinity]
Writing DH.asm.hic.hap2.p_ctg.gfa to disk... 
Inconsistency threshold for low-quality regions in BED files: 70%
[M::main] Version: 0.16.1-r375
[M::main] CMD: hifiasm -o DH.asm -t32 --h1 HIC_1.fq.gz --h2 HIC_2.fq.gz MZ.fastq.gz
[M::main] Real time: 118378.108 sec; CPU: 2328290.028 sec; Peak RSS: 171.895 GB
chhylp123 commented 1 year ago

This is a warning. Probably just ignore it as long as the assembly looks ok.

JanMiao commented 1 year ago

Thank you for your prompt reply. I appreciate your help with resolving the error. May I ask for your advice on how to determine the quality of the assembly result? I want to ensure that the results I obtain are reliable and accurate.

JanMiao commented 1 year ago

Do I need to preprocess the HiC data, such as quality control and removing duplicates, before running hifiasm?

chhylp123 commented 1 year ago

Well ,simply look N50/qv should be fine.