chhylp123 / hifiasm

Hifiasm: a haplotype-resolved assembler for accurate Hifi reads
MIT License
526 stars 86 forks source link

possible regression in using HiC reads in --h1 --h2 #613

Open jbh-cas opened 7 months ago

jbh-cas commented 7 months ago

Version hifiasm_0.19.8-r603 using --h1 --h2 is stopping at a Partition step after the hic.p_ctg gfa is written.

Last version that we had run that worked for us was hifiasm_0.19.5-r590 so we reran using that and h1 and h2 gfas were successfuly created. Relevant log extracts below. (We have not run versions in between to triangulate but could if you would like. Also input HiC was not exactly the same as we tried different lanes but additional tests showed same stopping at Partition step with r603.)

Note [M::dedup_hits::0.000] ==> Dedup in r603 compared to [M::dedup_hits::2.269] ==> Dedup [M::dedup_hits::1.126] ==> Dedup in r590

Thanks for any insight, Jim Henderson

version 0.19.8-r603 log extract

hifiasm 0.19.8-r603
...
Writing hifiasm.asm.hic.p_ctg.gfa to disk... 
[M::ha_opt_update_cov] updated max_n_chain to 200
[M::gen_trans_base_count_comp::504.346] ==> Qualification
[M::build_unitig_index::336.160] ==> Counting
[M::build_unitig_index::0.000] ==> Memory allocating
[M::build_unitig_index::345.438] ==> Filling pos
[M::build_unitig_index::0.000] ==> Sorting pos
[M::build_unitig_index::681.598] ==> HiC index has been built
[M::write_hc_pt_index] Index has been written.
[M::alignment_worker_pipeline::705.241] ==> Qualification
[M::dedup_hits::0.000] ==> Dedup
[M::adjust_weight_kv_u_trans_advance::0.000] 
[M::mc_solve:: # edges: 0]
[M::mb_solve_core::0.006] ==> Partition
[M::mc_solve_core_adv::0.008] ==> Partition

hifiasm program just stops here. No error shown.

version 0.19.5-r590 log extract

hifiasm 0.19.5-r590
...
Writing hifiasm.asm.hic.p_ctg.gfa to disk... 
[M::ha_opt_update_cov] updated max_n_chain to 200
[M::gen_trans_base_count_comp::568.114] ==> Qualification
[M::build_unitig_index::239.886] ==> Counting
[M::build_unitig_index::59.715] ==> Memory allocating
[M::build_unitig_index::183.955] ==> Filling pos
[M::build_unitig_index::1.304] ==> Sorting pos
[M::build_unitig_index::484.863] ==> HiC index has been built
[M::write_hc_pt_index] Index has been written.
[M::alignment_worker_pipeline::414.203] ==> Qualification
[M::dedup_hits::2.269] ==> Dedup
[M::dedup_hits::1.126] ==> Dedup
[M::stat] # misjoined unitigs: 28 (N50: 1516338); # corrected unitigs: 56 (N50: 938380)
[M::adjust_weight_kv_u_trans_advance::4.329] 
[M::mc_solve:: # edges: 7272466]
[M::mb_solve_core::19.670] ==> Partition
[M::mc_solve_core_adv::71.107] ==> Partition
[M::adjust_weight_kv_u_trans_advance::6.831] 
[M::mc_solve:: # edges: 7283428]
[M::mb_solve_core::22.105] ==> Partition
[M::mc_solve_core_adv::28.921] ==> Partition
[M::adjust_weight_kv_u_trans_advance::6.789] 
[M::mc_solve:: # edges: 7283434]
[M::mb_solve_core::21.063] ==> Partition
[M::mc_solve_core_adv::25.306] ==> Partition
[M::stat] # heterozygous bases: 6726685291; # homozygous bases: 300550520
[M::reduce_hamming_error_adv::7.843] # inserted edges: 83806, # fixed bubbles: 423
[M::adjust_utg_by_trio] primary contig coverage range: [34, infinity]
[M::recall_arcs] # transitive arcs::262
[M::recall_arcs] # new arcs::387894, # old arcs::248568
[M::clean_trio_untig_graph] # adjusted arcs::0
[M::adjust_utg_by_trio] primary contig coverage range: [34, infinity]
[M::recall_arcs] # transitive arcs::428
[M::recall_arcs] # new arcs::395048, # old arcs::252238
[M::clean_trio_untig_graph] # adjusted arcs::0
[M::output_trio_graph_joint] dedup_base::11654549, miss_base::0
Writing hifiasm.asm.hic.hap1.p_ctg.gfa to disk... 
Writing hifiasm.asm.hic.hap2.p_ctg.gfa to disk... 
Inconsistency threshold for low-quality regions in BED files: 70%
[M::main] Version: 0.19.5-r590
[M::main] CMD: hifiasm_0.19.5-r590 --write-ec --write-paf -t 64 --h1 input/Nfusc_a1009_L4_R1_clean.fq.gz --h2 input/Nfusc_a1009_L4_R2_clean.fq.gz input/hifiasm.asm.ec.fa
[M::main] Real time: 48887.154 sec; CPU: 2376941.041 sec; Peak RSS: 231.264 GB
jbh-cas commented 7 months ago

I reran with same HiFi, HiC inputs as on hifiasm 0.19.5-r590 using hifiasm 0.19.6-r595 and 0.19.7-r598 and h1, h2 gfa files were created with both these versions.

As a reminder 0.19.8-r603 stops after two Partition steps and does not create the h1 or h2 gfa files as shown in log above.

I don't have r599 thru r602 built. Any ideas about why the program just ends after the Partitions steps.

thank very much.

AndreaGuarracino commented 7 months ago

I have a similar issue with the current master (commit 1ac574adc78fbdaed2d2dcd49d5ea3deed7478de), but with an unnice signal 11:

...
[M::ha_print_ovlp_stat] # overlaps without large indels: 540614033
[M::ha_print_ovlp_stat] # reverse overlaps: 94144072
[M::ha_opt_update_cov_min] updated max_n_chain to 225
Writing reads to disk... 
Reads has been written.
Writing ma_hit_ts to disk... 
ma_hit_ts has been written.
Writing ma_hit_ts to disk... 
ma_hit_ts has been written.
bin files have been written.
[M::purge_dups] homozygous read coverage threshold: 44
[M::purge_dups] purge duplication coverage threshold: 56
[M::ug_ext_gfa::] # tips::37
Writing raw unitig GFA to disk... 
Writing processed unitig GFA to disk... 
[M::adjust_utg_by_primary] primary contig coverage range: [37, infinity]
Writing DBA2J.hic.p_ctg.gfa to disk... 
[M::ha_opt_update_cov] updated max_n_chain to 225
[M::gen_trans_base_count_comp::939.770] ==> Qualification
[M::build_unitig_index::58.395] ==> Counting
[M::build_unitig_index::28.325] ==> Memory allocating
[M::build_unitig_index::81.649] ==> Filling pos
[M::build_unitig_index::0.268] ==> Sorting pos
[M::build_unitig_index::168.642] ==> HiC index has been built
[M::write_hc_pt_index] Index has been written.
[M::alignment_worker_pipeline::1472.315] ==> Qualification
[M::dedup_hits::15.710] ==> Dedup
[M::dedup_hits::7.796] ==> Dedup
[M::stat] # misjoined unitigs: 1 (N50: 693811); # corrected unitigs: 2 (N50: 527464)
[M::adjust_weight_kv_u_trans_advance::0.920] 
[M::mc_solve:: # edges: 2259920]
[M::mb_solve_core::2.778] ==> Partition
Command terminated by signal 11
        Command being timed: "hifiasm -o DBA2J -t 96 -l0 --h1 DTG-HIC-408_R1_001.fastq.gz,DTG-HIC-410_R1_001.fastq.gz --h2 DTG-HIC-408_R2_001.fastq.gz,DTG-HIC-410_R2_001.fastq.gz TBG-4829_m84078_231116_130625_s1.hifi_reads.default.fastq.gz MouseStrainD2_TBG_4829_1.hifi_reads.fastq.gz"
AndreaGuarracino commented 6 months ago

To be able to work with Hi-C data, I have to revert to the version with commit 94a284b4309837417dd9951a5f72a13d513d826e.

chhylp123 commented 5 months ago

Hi @AndreaGuarracino , is it possible that you can share the data with me? I could fix this issue as soon as possible. Sorry for the late reply.

AndreaGuarracino commented 5 months ago

@chhylp123, I will send you something soon!