Closed jsemple19 closed 1 year ago
Found Killed
in the error log.
/bin/bash: line 1: 3548 Killed Rscript --max-ppsize=500000 $(which run_spp.R) -c=/data/projects/p025/jenny/shweta/chip-seq-pipeline2/results/LET418_YA/chip/b8c9f35e-4f0b-4277-a384-6eef58480425/call-call_peak_pr2/shard-1/attempt-2/inputs/383861301/LET-418_YA_rep2_IP.srt.nodup.pr2.tagAlign.gz -i=/data/projects/p025/jenny/shweta/chip-seq-pipeline2/results/LET418_YA/chip/b8c9f35e-4f0b-4277-a384-6eef58480425/call-call_peak_pr2/shard-1/attempt-2/inputs/-1098189590/LET-418_YA_rep2_input.srt.nodup.tagAlign.gz -npeak=300000 -odir=/data/projects/p025/jenny/shweta/chip-seq-pipeline2/results/LET418_YA/chip/b8c9f35e-4f0b-4277-a384-6eef58480425/call-call_peak_pr2/shard-1/attempt-2/execution -speak=155 -savr=LET-418_YA_rep2_IP.srt.nodup.pr2_x_LET-418_YA_rep2_input.srt.nodup.300K.regionPeak.gz.tmp -fdr=0.01 -rf -p=6
Please check if your system has enough memory or run with caper run ... --max-concurrent-tasks 1
to disable parallelization.
I am working on a SLURM cluster and can get lots of resource but i am not quite sure how to control the resources in this pipeline as cromwell launches jobs with different numbers of cpus etc that is completely independent of the resources i request in the launching sbatch script where i have the caper command.
I have tried the caper run ... --max-concurrent-tasks 1
option but that still gets the job killed.
I have tried increasing the amount of memory for spp in the chip.wdl file
Float call_peak_spp_mem_factor = 16.0
I could increase it even more, but 16Gb is already much bigger than the tagAlign files (few hundred Mb at most) which seem to be the input to spp.
Here is the cromwell output of my last failed attempt:
cromwell.out.11.txt
And these are the resorces i requested in the submitting batch script:
#SBATCH --time=2-00:00:00
#SBATCH --cpus-per-task=1
#SBATCH --mem-per-cpu=16G
and i called caper with
caper run chip.wdl -i "${jsonFile}" --singularity --slurm-partition pall --slurm-account $USER --local-out-dir results/${grp} --str-label ${grp} --max-concurrent-tasks 1
New caper is out. Upgrade caper with pip install caper --upgrade
first. And then make a backup of your ~/.caper/default.conf
and run caper init slurm
(this will overwrite on your existing conf file).
Follow instructions in ~/.caper/default.conf
.
Please don't run caper run
on HPCs, use caper hpc submit
instead. This will submit a caper leader job which will actually submit (sbatch
) each WDL-task to the cluster engine.
So you don't need a bash script with SLURM parameter comments.
Run caper hpc list
to check status of workflows.
Upgrading caper got caper hpc submit
to work, but i still had to increase spp resources even more (32 factor) to get it to eventually work. Thank so much for all your help!
Describe the bug
I am running caper on a slurm cluster with the --singularity option, and the histone pipeline works fine, but the TF pipeline produces only signal tracks but no peak calls.
Looking at the task-graph files i see there is no peak calling in the TF one:![croo task_graph b8c9f35e-4f0b-4277-a384-6eef58480425](https://user-images.githubusercontent.com/8479067/200637882-5dd70792-50fb-45dc-af4c-e287ac798240.svg)
whereas the histone one does:![croo task_graph da2bf31e-e3e0-47ce-ad4a-1e2afc7b79b4](https://user-images.githubusercontent.com/8479067/200631581-91bf55f9-2b4e-4743-914c-b96e65f4a8e1.svg)
Does the absence of peak calling on the task graph mean that it is not part of the TF pipeline? The google doc description of the pipeline would suggest otherwise. Or should i be looking for a bug causing it to fail?
OS/Platform
Caper command:
Caper configuration file
Paste contents of
~/.caper/default.conf
.Input JSON file
Paste contents of your input JSON file.
Troubleshooting result
If you ran
caper run
without Caper server then Caper automatically runs a troubleshooter for failed workflows. Find troubleshooting result in the bottom of Caper's screen log.If you ran
caper submit
with a running Caper server then first find your workflow ID (1st column) withcaper list
and runcaper debug [WORKFLOW_ID]
.Paste troubleshooting result. cromwell.out.txt