chhylp123 / hifiasm

Hifiasm: a haplotype-resolved assembler for accurate Hifi reads
MIT License
533 stars 87 forks source link

Assembly running out of memory; tried tuning down minimizer window size and kmer #514

Open hans-vg opened 1 year ago

hans-vg commented 1 year ago

I am working on an assembly for an expected genome size of 2.65Gbp. We have HiC data, and 6 SMRTCells worth of HIFI data (3 from a male, and 3 from a female).

Initially, I tried the assembly at the default 51 kmer size, but my job scheduled killed the command because it was using more than the max allocated memory 225GB. I then started reducing the -k and -w values, but the job is still being killed due to memory.

Is there a way to help reduce the necessary memory? Our campus cluster only has nodes that max out at 256GB RAM.

Command:

Allocated threads:  64
Allocated memory:  225
hifiasm -o ipac.asm -t64 -k 41 -w 41 --h1 raw_seq/hic/01.hic/CKDL230007432-1a_L3_1.fq.gz --h2 raw_seq/hic/01.hic/CKDL230007432-1a_L3_2.fq.gz raw_seq/pacbio_hifi/ipac_male_70pM_Cell1_CCS/call-export_fastq/execution/m64027_230307_235112.hifi_reads.fastq.gz raw_seq/pacbio_hifi/ipac_male_70pM_Cell2_CCS/call-export_fastq/execution/m64027_230309_104937.hifi_reads.fastq.gz raw_seq/pacbio_hifi/ipac_male_70pM_Cell3_CCS/call-export_fastq/execution/m64027_230310_202134.hifi_reads.fastq.gz raw_seq/pacbio_hifi/ipac_female_70pM_Cell4_CCS/call-export_fastq/execution/m64027_230312_072049.hifi_reads.fastq.gz raw_seq/pacbio_hifi/ipac_female_70pM_Cell5_CCS/call-export_fastq/execution/m64027_230313_181833.hifi_reads.fastq.gz raw_seq/pacbio_hifi/ipac_female_70pM_Cell6_CCS/call-export_fastq/execution/m64027_230315_051622.hifi_reads.fastq.gz

Error Log: hifiasm-4634103.err.txt

chhylp123 commented 1 year ago

It looks like there are some issues for your input HiFi reads. Please see FAQ here: https://hifiasm.readthedocs.io/en/latest/faq.html#why-does-hifiasm-stuck-or-crash

hans-vg commented 1 year ago

What is the issue with the HIFI data? I noticed the kmer plot doesn't look like the bimodal distribution. Is it because I'm providing samples from both male and female from a species?

Thank you.

chhylp123 commented 1 year ago

It might be. Why would you like to assemble both male and female at once?

hans-vg commented 1 year ago

I was hoping to get the male/female specific chromosomes, but I could just do two assemblies and add the specific chr to the other assembly. Either way, I just tried to do an assembly with only the male data but am still getting an out of memory error when trying to use 225GB of RAM.

hifiasm-4642659.err.txt