Job exited with out of memory error during the final stage of the run (99% done)

Hi, my dekupl-run script exited with an out of memory error during the last step of the script. Here is the output of this section of the slurm log file:

[1] "2020-04-09 05:51:00 Start DESeq2_diff_methods"
[1] "2020-04-09 05:59:44 Shuffle and split done"
gzip: /staging/sn1/genutis/dekupl_workspace/tmp/dekupl_tmp/test_diff/tmp_chunks// is a directory -- ignored
[1] "2020-04-09 05:59:45 Split done"
[1] "2020-04-09 05:59:45 Foreach of the 0 files"
Error in { : task 1 failed - "cannot open the connection"
Calls: %dopar% -> <Anonymous>
Execution halted
[Thu Apr  9 05:59:46 2020]
Error in rule test_diff_counts:
    jobid: 3
    output: /staging/sn1/genutis/dekupl_workspace/Adenocarcinoma_lung_vs_Normal_lung_kmer_counts/diff-counts.tsv.gz, /staging/sn1/genutis/dekupl_workspace/Adenocarcinoma_lung_vs_Normal_lung_kmer_counts/raw_pvals.txt.gz
    log: /staging/sn1/genutis/dekupl_workspace/Logs/test_diff_counts.logs (check log file(s) for error message)
    shell:

        Rscript /auto/cmb-07/sn1/genutis/software/anaconda3/envs/dekupl/share/dekupl/bin/DESeq2_diff_method.R         /auto/cmb-07/sn1/genutis/software/anaconda3/envs/dekupl/share/dekupl/bin/TtestFilter         /staging/sn1/genutis/dekupl_workspace/kmer_counts/masked-counts.tsv.gz         /staging/sn1/genutis/dekupl_workspace/metadata/sample_conditions_full.tsv         0.05         2         Adenocarcinoma_lung         Normal_lung         12         1000000         /staging/sn1/genutis/dekupl_workspace/tmp/dekupl_tmp/test_diff         /staging/sn1/genutis/dekupl_workspace/Adenocarcinoma_lung_vs_Normal_lung_kmer_counts/diff-counts.tsv.gz         /staging/sn1/genutis/dekupl_workspace/Adenocarcinoma_lung_vs_Normal_lung_kmer_counts/raw_pvals.txt.gz         /staging/sn1/genutis/dekupl_workspace/Logs/test_diff_counts.logs

        (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: /auto/rcf-13/genutis/.snakemake/log/2020-04-06T123535.105182.snakemake.log
Thu Apr  9 05:59:46 PDT 2020
finished dekupl-run
slurmstepd: error: Detected 1 oom-kill event(s) in step 6799143.batch cgroup.

It looks like perhaps the gzip line has an extra / at the end of the path?

according to the message “gzip: /staging/sn1/genutis/dekupl_workspace/tmp/dekupl_tmp/test_diff/tmp_chunks// is a directory – ignored”, the filename at the end of the path may be empty, and the problem may occur from when setting chunks.

Communicated by Haoliang Xue

I believe I have found more clues for this issue, coming from line 98 of DESeq2_diff_method.R

# SHUFFLE AND SPLIT THE MAIN FILE INTO CHUNKS WITH AUTOINCREMENTED NAMES
system(paste("zcat", kmer_counts, "| tail -n +2 | shuf | awk -v", paste("chunk_size=", chunk_size,sep=""), "-v", paste("output_tmp_chunks=",output_tmp_chunks,sep=""),
             "'NR%chunk_size==1{OFS=\"\\t\";x=++i\"_subfile.txt.gz\"}{OFS=\"\";print | \"gzip >\" output_tmp_chunks x}'"))

My input command for running this R script, from my slurm log file for this run was this:

Rscript /auto/cmb-07/sn1/genutis/software/anaconda3/envs/dekupl/share/dekupl/bin/DESeq2_diff_method.R         /auto/cmb-07/sn1/genutis/software/anaconda3/envs/dekupl/share/dekupl/bin/TtestFilter         /staging/sn1/genutis/dekupl_workspace/kmer_counts/masked-counts.tsv.gz         /staging/sn1/genutis/dekupl_workspace/metadata/sample_conditions_full.tsv         0.05         2         Adenocarcinoma_lung         Normal_lung         12         1000000         /staging/sn1/genutis/dekupl_workspace/tmp/dekupl_tmp/test_diff         /staging/sn1/genutis/dekupl_workspace/Adenocarcinoma_lung_vs_Normal_lung_kmer_counts/diff-counts.tsv.gz         /staging/sn1/genutis/dekupl_workspace/Adenocarcinoma_lung_vs_Normal_lung_kmer_counts/raw_pvals.txt.gz         /staging/sn1/genutis/dekupl_workspace/Logs/test_diff_counts.logs

I used the system() command from line 98 and my input command arguments to try and generate an error message in an interactive shell, however the command seems to crash silently in R console. So I ran the command without system() to get a formatted line for a bash shell:

interactive R shell to format system shell command:

> kmer_counts = '/staging/sn1/genutis/dekupl_workspace/kmer_counts/masked-counts.tsv.gz'
> chunk_size = 1000000
> output_tmp                = '/staging/sn1/genutis/dekupl_workspace/tmp/dekupl_tmp/test_diff'   
> output_tmp_chunks         = paste(output_tmp,"/tmp_chunks/",sep="")
> paste("zcat", kmer_counts, "| tail -n +2 | shuf | awk -v", paste("chunk_size=", chunk_size,sep=""), "-v", paste("output_tmp_chunks=",output_tmp_chunks,sep=""),
+              "'NR%chunk_size==1{OFS=\"\\t\";x=++i\"_subfile.txt.gz\"}{OFS=\"\";print | \"gzip >\" output_tmp_chunks x}'")
[1] "zcat /staging/sn1/genutis/dekupl_workspace/kmer_counts/masked-counts.tsv.gz | tail -n +2 | shuf | awk -v chunk_size=1e+06 -v output_tmp_chunks=/staging/sn1/genutis/dekupl_workspace/tmp/dekupl_tmp/test_diff/tmp_chunks/ 'NR%chunk_size==1{OFS=\"\\t\";x=++i\"_subfile.txt.gz\"}{OFS=\"\";print | \"gzip >\" output_tmp_chunks x}'"

interactive bash shell output of the R command:

$ zcat /staging/sn1/genutis/dekupl_workspace/kmer_counts/masked-counts.tsv.gz | tail -n +2 | shuf | awk -v chunk_size=1e+06 -v output_tmp_chunks=/staging/sn1/genutis/dekupl_workspace/tmp/dekupl_tmp/test_diff/tmp_chunks/ 'NR%chunk_size==1{OFS=\"\\t\";x=++i\"_subfile.txt.gz\"}{OFS=\"\";print | \"gzip >\" output_tmp_chunks x}'
awk: cmd. line:1: NR%chunk_size==1{OFS=\"\\t\";x=++i\"_subfile.txt.gz\"}{OFS=\"\";print | \"gzip >\" output_tmp_chunks x}
awk: cmd. line:1:                      ^ backslash not last character on line
awk: cmd. line:1: NR%chunk_size==1{OFS=\"\\t\";x=++i\"_subfile.txt.gz\"}{OFS=\"\";print | \"gzip >\" output_tmp_chunks x}
awk: cmd. line:1:                      ^ syntax error

In the interactive bash shell command line, I get output if I leave the awk command out of the pipeline, so the files appear to be correct up to this point. Perhaps there is an issue with generating the formatted awk command?

Communicated by Yunfeng Wang: Edit the Snakemake file and try to change the MAX_CPU from 1000 to 20, which should be around Line 132.

Communicated by Claire Toffano. Change your awk line to (removing backslashes): zcat /staging/sn1/genutis/dekupl_workspace/kmer_counts/masked-counts.tsv.gz | tail -n +2 | shuf | awk -v chunk_size=1e+06 -v output_tmp_chunks=/staging/sn1/genutis/dekupl_workspace/tmp/dekupl_tmp/test_diff/tmp_chunks/ 'NR%chunk_size==1{OFS="\t";x=++i"_subfile.txt.gz"}{OFS="";print | "gzip >" output_tmp_chunks x}'

Thank you all for your help so far. I am still troubleshooting with the job running, so I will not know if the MAX_CPU parameter affected the change until tomorrow. However, in the mean time I’ve been working on that awk line, and I’m having trouble removing these backslashes from the line in the R script without breaking the paste command.

The previous awk command with all the backslashes is generated correctly in an interactive R shell:

> kmer_counts = '/staging/sn1/genutis/dekupl_workspace/kmer_counts/masked-counts.tsv.gz'
> chunk_size = 1000000
> output_tmp                = '/staging/sn1/genutis/dekupl_workspace/tmp/dekupl_tmp/test_diff'   
> output_tmp_chunks         = paste(output_tmp,"/tmp_chunks/",sep="")
> paste("zcat", kmer_counts, "| tail -n +2 | shuf | awk -v", paste("chunk_size=", chunk_size,sep=""), "-v", paste("output_tmp_chunks=",output_tmp_chunks,sep=""),
        "'NR%chunk_size==1{OFS=\"\\t\";x=++i\"_subfile.txt.gz\"}{OFS=\"\";print | \"gzip >\" output_tmp_chunks x}'")
[1] "zcat /staging/sn1/genutis/dekupl_workspace/kmer_counts/masked-counts.tsv.gz | tail -n +2 | shuf | awk -v chunk_size=1e+06 -v output_tmp_chunks=/staging/sn1/genutis/dekupl_workspace/tmp/dekupl_tmp/test_diff/tmp_chunks/ 'NR%chunk_size==1{OFS=\"\\t\";x=++i\"_subfile.txt.gz\"}{OFS=\"\";print | \"gzip >\" output_tmp_chunks x}’”

But as soon as I omit these backslashes, the paste command breaks with an unclear error:

> paste("zcat", kmer_counts, "| tail -n +2 | shuf | awk -v", paste("chunk_size=", chunk_size,sep=""), "-v", paste("output_tmp_chunks=",output_tmp_chunks,sep=""), "'NR%chunk_size==1{OFS="\t";x=++i"_subfile.txt.gz"}{OFS="\";print | "gzip >" output_tmp_chunks x}'")
Error: unexpected input in "paste("zcat", kmer_counts, "| tail -n +2 | shuf | awk -v", paste("chunk_size=", chunk_size,sep=""), "-v", paste("output_tmp_chunks=",output_tmp_chunks,sep=""), "'NR%chunk_size==1{OFS="\"

However, the final formatted command without the backslashes, ran in a bash shell, doesn’t seem to work either, with nothing being written to the tmp_chunks directory.

zcat /staging/sn1/genutis/dekupl_workspace/kmer_counts/masked-counts.tsv.gz | tail -n +2 | shuf | awk -v chunk_size=1e+06 -v output_tmp_chunks=/staging/sn1/genutis/dekupl_workspace/tmp/dekupl_tmp/test_diff/tmp_chunks/ 'NR%chunk_size==1{OFS="\t";x=++i"_subfile.txt.gz"}{OFS="";print | "gzip >" output_tmp_chunks x}’

ls staging/sn1/genutis/dekupl_workspace/tmp/dekupl_tmp/test_diff/tmp_chunks/

On Apr 14, 2020, at 1:08 AM, Daniel Gautheret notifications@github.com wrote:

Communicated by Claire Toffano. Change your awk line to (removing backslashes): zcat /staging/sn1/genutis/dekupl_workspace/kmer_counts/masked-counts.tsv.gz | tail -n +2 | shuf | awk -v chunk_size=1e+06 -v output_tmp_chunks=/staging/sn1/genutis/dekupl_workspace/tmp/dekupl_tmp/test_diff/tmp_chunks/ 'NR%chunk_size==1{OFS="\t";x=++i"_subfile.txt.gz"}{OFS="";print | "gzip >" output_tmp_chunks x}'

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

Transipedia / dekupl-run

Job exited with out of memory error during the final stage of the run (99% done) #70