Nextomics / NextDenovo

Fast and accurate de novo assembler for long reads
GNU General Public License v3.0
352 stars 52 forks source link

What is the memory requirement for nodes when run a huge input fastq? #34

Closed zengpeng2012 closed 4 years ago

zengpeng2012 commented 4 years ago

Hi, Hu

When I run the pipeline for a 2.8Tb input fasta file, the progress was struck at step 02.cns_align/01.get_cns.sh.work, it reported report an error out-of-memory. how can I carry on the job, or need larger memory nodes? My cluster is slurm and the node memory is 192GB and 36 cpus.

Blew is the configure file details [General] job_type = slurm job_prefix = Pp task = all # 'all', 'correct', 'assemble' rewrite = yes # yes/no deltmp = no rerun = 3 parallel_jobs = 50 input_type = raw input_fofn = ./input.fofn workdir = ./01_rundir

usetempdir = /tmp/test

nodelist = avanode.list.fofn

cluster_options = -p q_cn -J nextDenovo -o nextDenovo.out -N 1 -n 1 -c 19 [correct_option] read_cutoff = 1k seed_cutoff = 15k blocksize = 1g pa_correction = 50 seed_cutfiles = 50 sort_options = -m 20g -t 20 -k 50 minimap2_options_raw = -x ava-ont -t 30 correction_options = -p 15

[assemble_option] random_round = 10 minimap2_options_cns = -x ava-ont -t 30 -k17 -w17 nextgraph_options = -a 1

moold commented 4 years ago

Could you provide the error log file?

zengpeng2012 commented 4 years ago

error log:

$cat Pp.sh.e hostname

moold commented 4 years ago

Try to use the parameter: correction_options = -p 15 -dbuf

zengpeng2012 commented 4 years ago

it works, memory request < 3gb, but outputs very slowly.

moold commented 4 years ago

Use usetempdir option or remove -dbuf option, will speed up.