Nextomics / NextPolish

Fast and accurately polish the genome generated by long reads.
GNU General Public License v3.0
200 stars 27 forks source link

SLURM out of memory and time... #125

Closed VendelboNM closed 7 months ago

VendelboNM commented 7 months ago

The Nextpolish polishing of a barley assembly generated using R10.4 ONT long-reads using ~x40 illumina 150 bp PE reads stops after approximately 1h due to a out-of-memory and time limit issue and i cannot seem to fix it. Hope that you can help!

SLURM resources:

SBATCH --job-name=polishing_nextpolish

SBATCH --output=./out/polishing_nextpolish.out

SBATCH --error=./err/polishing_nextpolish.err

SBATCH --ntasks=30

SBATCH --cpus-per-task=1

SBATCH --mem=600G

SBATCH --time=48:00:00

run.cfg: [General] job_type = slurm job_prefix = nextPolish task = best rewrite = yes rerun = 3 parallel_jobs = 6 multithread_jobs = 5 genome = /projects/mjolnir1/people/bgn602/steps/01_comp_study/flye/parameter_eval/t2/A3_em2_1/A3_em2_1.fasta genome_size = auto workdir = /projects/mjolnir1/people/bgn602/steps/01_comp_study/nextpolish/parameter_eval/t2/A3_em2_1 polish_options = -p {multithread_jobs}

[sgs_option] sgs_fofn = /projects/mjolnir1/people/bgn602/steps/01_comp_study/nextpolish/parameter_eval/t2/A3_em2_1/sgs.fofn sgs_options = -max_depth 100 -bwa

sgs.fofn /projects/mjolnir1/people/bgn602/steps/01_comp_study/fastp/illu/hv_em2_1_illu_processed.fastq /projects/mjolnir1/people/bgn602/steps/01_comp_study/fastp/illu/hv_em2_2_illu_processed.fastq

info file: [2302180 INFO] 2023-11-17 15:59:42 NextPolish start... [2302180 INFO] 2023-11-17 15:59:42 version:v1.4.1 logfile:pid2302180.log.info [2302180 WARNING] 2023-11-17 15:59:42 Re-write workdir [2302180 WARNING] 2023-11-17 15:59:46 Delete task: 5 due to missing lgs_fofn. [2302180 WARNING] 2023-11-17 15:59:46 Delete task: 5 due to missing lgs_fofn. [2302180 WARNING] 2023-11-17 15:59:46 Delete task: 6 due to missing hifi_fofn. [2302180 WARNING] 2023-11-17 15:59:46 Delete task: 6 due to missing hifi_fofn. [2302180 INFO] 2023-11-17 15:59:46 scheduled tasks: [1, 2, 1, 2] [2302180 INFO] 2023-11-17 15:59:46 options: [2302180 INFO] 2023-11-17 15:59:46 rerun: 3 rewrite: 1 kill: None cleantmp: 0 use_drmaa: 0 submit: None deltmp: False job_type: slurm sgs_unpaired: 0 sgs_rm_nread: 1 lgs_read_type: parallel_jobs: 6 align_threads: 5 check_alive: None sgs_max_depth: 50 task: [1, 2, 1, 2] job_id_regex: None lgs_max_depth: 100 multithread_jobs: 5 lgs_max_read_len: 0 hifi_max_depth: 100 polish_options: -p 5 lgs_block_size: 500M lgs_min_read_len: 1k hifi_max_read_len: 0 hifi_block_size: 500M hifi_min_read_len: 1k job_prefix: nextPolish genome_size: 3900762858 sgs_block_size: 500000000 sgs_use_duplicate_reads: 0 lgs_minimap2_options: -x map-ont hifi_minimap2_options: -x map-pb sgs_align_options: bwa mem -p -t 5 workdir: /projects/mjolnir1/people/bgn602/steps/01_comp_study/nextpolish/parameter_eval/t2/A3_em2_1 genome: /projects/mjolnir1/people/bgn602/steps/01_comp_study/flye/parameter_eval/t2/A3_em2_1/A3_em2_1.fasta sgs_fofn: /projects/mjolnir1/people/bgn602/steps/01_comp_study/nextpolish/parameter_eval/t2/A3_em2_1/sgs.fofn snp_phase: /projects/mjolnir1/people/bgn602/steps/01_comp_study/nextpolish/parameter_eval/t2/A3_em2_1/%02d.snp_phase snp_valid: /projects/mjolnir1/people/bgn602/steps/01_comp_study/nextpolish/parameter_eval/t2/A3_em2_1/%02d.snp_valid lgs_polish: /projects/mjolnir1/people/bgn602/steps/01_comp_study/nextpolish/parameter_eval/t2/A3_em2_1/%02d.lgs_polish kmer_count: /projects/mjolnir1/people/bgn602/steps/01_comp_study/nextpolish/parameter_eval/t2/A3_em2_1/%02d.kmer_count hifi_polish: /projects/mjolnir1/people/bgn602/steps/01_comp_study/nextpolish/parameter_eval/t2/A3_em2_1/%02d.hifi_polish score_chain: /projects/mjolnir1/people/bgn602/steps/01_comp_study/nextpolish/parameter_eval/t2/A3_em2_1/%02d.score_chain [2302180 INFO] 2023-11-17 15:59:46 step 0 and task 1 start: [2302180 INFO] 2023-11-17 15:59:51 Total jobs: 3 [2302180 INFO] 2023-11-17 15:59:51 Submitted jobID:[7449925] jobCmd:[/projects/mjolnir1/people/bgn602/steps/01_comp_study/nextpolish/parameter_eval/t2/A3_em2_1/00.score_chain/01.db_split.sh.work/db_split1/nextPolish.sh] in the slurm_cycle. [2302180 INFO] 2023-11-17 15:59:51 Submitted jobID:[7449926] jobCmd:[/projects/mjolnir1/people/bgn602/steps/01_comp_study/nextpolish/parameter_eval/t2/A3_em2_1/00.score_chain/01.db_split.sh.work/db_split2/nextPolish.sh] in the slurm_cycle. [2302180 INFO] 2023-11-17 15:59:51 Submitted jobID:[7449927] jobCmd:[/projects/mjolnir1/people/bgn602/steps/01_comp_study/nextpolish/parameter_eval/t2/A3_em2_1/00.score_chain/01.db_split.sh.work/db_split3/nextPolish.sh] in the slurm_cycle. [2302180 ERROR] 2023-11-17 17:00:34 db_split failed: please check the following logs: [2302180 ERROR] 2023-11-17 17:00:34 /projects/mjolnir1/people/bgn602/steps/01_comp_study/nextpolish/parameter_eval/t2/A3_em2_1/00.score_chain/01.db_split.sh.work/db_split1/nextPolish.sh.e [2302180 ERROR] 2023-11-17 17:00:34 /projects/mjolnir1/people/bgn602/steps/01_comp_study/nextpolish/parameter_eval/t2/A3_em2_1/00.score_chain/01.db_split.sh.work/db_split2/nextPolish.sh.e

db_split1/nextPolish.sh.e: hostname

db_split2/nextPolish.sh.e: hostname

real 50m38.542s user 49m35.333s sys 0m42.718s slurmstepd: error: Detected 1 oom-kill event(s) in StepId=7750999.batch cgroup. Some of your processes may have been killed by the cgroup out-of-memory handler.

paralleltask/cluster.cfg [SGE] submit = qsub -pe smp {cpu} -l vf={mem} -o {out} -e {err} {script} kill = qdel {job_id} check-alive = qstat -j {job_id} job-id-regex = (\d+)

[PBS/TORQUE] submit = qsub -l nodes=1:ppn={cpu},mem={mem} -o {out} -e {err} {script} kill = qdel {job_id} check-alive = qstat {job_id} job-id-regex = (\d+)

[LSF] submit = bsub -n {cpu} -R rusage[mem={mem}] -o {out} -e {err} {script} kill = bkill {job_id} check-alive = bjobs {job_id} job-id-regex = Job <(\d+)>

[SLURM] submit = sbatch --cpus-per-task=1 --mem-per-cpu=20G --time 48:00:00 -o {out} -e {err} {script} kill = scancel {job_id} check-alive = squeue -j {job_id} job-id-regex = Submitted batch job (\d+)

moold commented 7 months ago

Hi, you can follow here, and use your own alignment pipeline.