Hi-C +HiFi sequencing run killed?

mabh5 commented 12 months ago

I am attempting to run a genome assembly and am running into an issue of the assembly getting killed at the very end. I saw some people having similar issues due to insufficient memory, however the fix of adding --s-base -1 to the command just brings up an error stating that --s-base -1 is an unknown option.

My first attempts at running this used this code:

hifiasm -o combinedHiC.asm -t32 --n-hap 4 --h1 gt-HiC_R1.fastq.gz --h2 gt-spc-HiC_R2.fastq.gz combined.fasta

which gave the 'killed' error. I did get some output files though. specifically I got the GFA, BIN and BED files for the assembly before the program was killed.

Advice is welcome!

chhylp123 commented 12 months ago

Which version are you using? Only new version supports --s-abse -1. It would be better to always use the latest version of hifiasm.

mabh5 commented 12 months ago

I am currently running 0.16.1-r375 which I take it is not the latest version then. Is there an easy way to update the program or do I need to uninstall the old one and re-install with the newer version?

mabh5 commented 11 months ago

I updated Hifiasm and got it to run the --s-base -1 command and re-ran my code. Still got the Kill error.

these are the last few lines of the run leading up to the kill:


Writing reads to disk...
Reads has been written.
Writing ma_hit_ts to disk...
ma_hit_ts has been written.
Writing ma_hit_ts to disk...
ma_hit_ts has been written.
bin files have been written.
[M::purge_dups] homozygous read coverage threshold: 56
[M::purge_dups] purge duplication coverage threshold: 70
[M::ug_ext_gfa::] # tips::688
Writing raw unitig GFA to disk...
Writing processed unitig GFA to disk...
[M::purge_dups] homozygous read coverage threshold: 56
[M::purge_dups] purge duplication coverage threshold: 70
[M::adjust_utg_by_primary] primary contig coverage range: [47, infinity]
Writing combinedHiC.asm.hic.p_ctg.gfa to disk...
[M::dump_trans_ovlp] Dump trans overlaps
+[M::trans_sec_cut_filter_mmhap_adv]    hom_cov::56     het_cov::14     hom_cut::65
Killed

advice?

chhylp123 commented 11 months ago

Could you please show the exact command lines you were using? Are you running hifiasm for polyploid genome?

mabh5 commented 11 months ago

Here is the exact command being run:

hifiasm -o combinedHiC.asm -t15 -l3 --s-base -1 --n-hap 4 --h1 gt-spc-1_1305565_S3HiC_R1.fastq.gz --h2 gt-spc-1_1305565_S3HiC_R2.fastq.gz combined.fasta

It is not a polypolid genome, however this is a fish that has undergone a full genome duplication a few million years ago. When I initially assembled this genome (prior to getting Hi-C data) the resulting genome size was far too small. Changing the --n-hap to 4 solved that issue.

mabh5 commented 11 months ago

I attempted to run hifiasm last name without the --n-hap 4 command and got the same error.

chhylp123 commented 11 months ago

Sorry for the late reply as I was quite busy during the last a few days. If possible, I would recommend you to have a try without --n-hap 4 since hifiasm worked better in the diploid model. For the size issue when working with diploid Hi-C, could you please have a try with FAQ here: https://hifiasm.readthedocs.io/en/latest/faq.html#how-can-i-tweak-parameters-to-improve-hi-c-integrated-assembly? So when running the following command line:

hifiasm -o combinedHiC.asm -t15 -l3 --s-base -1 --h1 gt-spc-1_1305565_S3HiC_R1.fastq.gz --h2 gt-spc-1_1305565_S3HiC_R2.fastq.gz combined.fasta

Did hifiasm still get killed? But for the following command line:

hifiasm -o combinedHiC.asm -t15 -l3 --s-base -1 combined.fasta

Hifiasm did work?

mabh5 commented 11 months ago

Hello!

I ran the following command last week:

hifiasm -o combinedHiC.asm -t15 -l2 --s-base -1 --h1 gtrout-spc-1_1305565_S3HiC_R1.fastq.gz --h2 gtrout-spc-1_1305565_S3HiC_R2.fastq.gz combined.fasta

and got back the same error. I will attempt to run the one you have above and see if there is a difference with runnint it with -l3 rather then -l2

as for the genome size issue: the FAQ only mentions situations in which the assembly was larger then expected. However we had the exact opposite problem with this genome in the initial assemble (PacBio Only assembly).

When I ran this code

hifiasm -o combined.asm -t 15 combined.fasta

I got back a genome that was 1.9G.

however when I ran this code:

hifiasm -o combined.asm -t 15 --n-hap 4 combined.fasta

I got back a genome of 2.3G which was the expected size for this species.

however this was all run without the Hi-C data. Still I am unsure if the answers in the FAQ are applicable here given that they are about getting too big of a result, not about getting too small of one.

I think for now though just getting the Hi-C assembly working and checking the genome size after and adjusting from there is the way to go.

hifiasm -o combinedHiC.asm -t15 -l3 --s-base -1 combined.fasta

I can try to run this command if you would like, however it is very similar to the very first run I did (without the --s-base -1) which worked just fine even if the genome size was incorrect. I can run this again if you would like just to check since my initial run had been done on an older version of hifiasm.

mabh5 commented 11 months ago

I ran hifiasm -o combinedHiC.asm -t15 -l3 --s-base -1 combined.fasta over night and got the expected output files.

This run gave me an output of the following files:


combined.asm.bp.hap1.p_ctg.gfa
combined.asm.bp.hap1.p_ctg.lowQ.bed
combined.asm.bp.hap1.p_ctg.noseq.gfa
combined.asm.bp.hap2.p_ctg.gfa
combined.asm.bp.hap2.p_ctg.lowQ.bed
combined.asm.bp.hap2.p_ctg.noseq.gfa
combined.asm.bp.p_ctg.gfa
combined.asm.bp.p_ctg.lowQ.bed
combined.asm.bp.p_ctg.noseq.gfa
combined.asm.bp.p_utg.gfa
combined.asm.bp.p_utg.lowQ.bed
combined.asm.bp.p_utg.noseq.gfa
combined.asm.bp.r_utg.gfa
combined.asm.bp.r_utg.lowQ.bed
combined.asm.bp.r_utg.noseq.gfa
combined.asm.ec.bin
combined.asm.ovlp.reverse.bin
combined.asm.ovlp.source.bin

while the output files for the latest 'killed' run was as follows:


combinedHiC.asm.ec.bin
combinedHiC.asm.hic.p_ctg.gfa
combinedHiC.asm.hic.p_ctg.lowQ.bed
combinedHiC.asm.hic.p_ctg.noseq.gfa
combinedHiC.asm.hic.p_utg.gfa
combinedHiC.asm.hic.p_utg.lowQ.bed
combinedHiC.asm.hic.p_utg.noseq.gfa
combinedHiC.asm.hic.r_utg.gfa
combinedHiC.asm.hic.r_utg.lowQ.bed
combinedHiC.asm.hic.r_utg.noseq.gfa
combinedHiC.asm.ovlp.reverse.bin
combinedHiC.asm.ovlp.source.bin

I am wondering if I should take a look at the combinedHiC.asm.hic.p_ctg.gfa file and see if it is at all a completed assembly even though I am not getting the hap1 and hap2 files? the file size is about the same as the non-Hi-C run.

mabh5 commented 11 months ago

decided to run a quast analysis on the combinedHiC.asm.hic.p_ctg.gfa file and see if it is at all a completed assembly and it looks like it is. It has the correct genome length and a contig number of 2172

do you believe this assembly is worth continuing with or do you think that something could be wrong with it given the 'killed' error?

chhylp123 commented 11 months ago

Sorry for the late reply as I was quite busy last month. I feel like it should be fine as long as the assembly is not bad.

chhylp123 / hifiasm

Hi-C +HiFi sequencing run killed? #570