YannBourgeois / Scripts_Genome_assembly_Tgraeca

A set of scripts for reference-guided assembly of the T. graeca genome
1 stars 0 forks source link

SOAPdenovo_array.sh error #1

Open AlkistisZach opened 1 week ago

AlkistisZach commented 1 week ago

We tried running the SOAPdenovo_array.sh script using 30 .fq files of total size 557GB as input and a reference genome with a total size of approximately 17GB. The SOAPdenovo_array.sh script requires a cluster with 22 computing nodes and 200GB RAM. Our cluster contains 16 computing nodes, with 48 CPUs and 120GB RAM each, thus we changed the SOAPdenovo_array.sh script SLURM parameters to cater to our system specs.

However, when we run the script using sbatch, it didn't produce any output. After checking the .err files, we get the following message at the file end:

slurmstepd: error: Detected 2 oom_kill events in StepId=3851.0. Some of the step tasks have been OOM Killed. srun: error: comp16: task 9: Out Of Memory slurmstepd: error: Detected 4 oom_kill events in StepId=3851.0. Some of the step tasks have been OOM Killed. --- 100000000th reads. --- 100000000th reads. --- 200000000th reads. --- 200000000th reads.

We wonder what modifications we should try in order to run SOAPdenovo_array.sh successfully.

YannBourgeois commented 6 days ago

Hi, I am afraid that you are reaching the limits of short read approaches for very large genomes. You need a lot of memory for it to work, and I am not even sure that assembling a 17Gb genome is feasible. If you have the reference of a close relative, you may consider aligning your short reads on it with BWA or Bowtie (but the reference has to be not too divergent) and generate a new consensus sequence (for example using bcftools). That may increase the reference bias, but would be faster and more straightforward. Another option would be to replace soapdenovo2 by another, more memory efficient assembler, but short-read assemblers are now falling out of fashion with the advent of long reads. You might want to try IDBA (https://github.com/loneknightpy/idba) or JR-Assembler (https://jr-assembler.iis.sinica.edu.tw/download.htm) There may be others I am not aware of.