ENCODE-DCC / chip-seq-pipeline2

ENCODE ChIP-seq pipeline
MIT License
234 stars 123 forks source link

No peak call output from "TF" pipeline #289

Closed jsemple19 closed 1 year ago

jsemple19 commented 1 year ago

Describe the bug

I am running caper on a slurm cluster with the --singularity option, and the histone pipeline works fine, but the TF pipeline produces only signal tracks but no peak calls.

Looking at the task-graph files i see there is no peak calling in the TF one: croo task_graph b8c9f35e-4f0b-4277-a384-6eef58480425

whereas the histone one does: croo task_graph da2bf31e-e3e0-47ce-ad4a-1e2afc7b79b4

Does the absence of peak calling on the task graph mean that it is not part of the TF pipeline? The google doc description of the pipeline would suggest otherwise. Or should i be looking for a bug causing it to fail?

OS/Platform

Caper command:

caper run chip.wdl -i ${jsonFile} --singularity --slurm-partition pall --slurm-account $USER --local-out-dir results/${grp} --str-label ${grp}

Caper configuration file

Paste contents of ~/.caper/default.conf.

# Use them within ${} notation.
# - cpu: number of cores for a job (default = 1)
# - memory_mb, memory_gb: total memory for a job in MB, GB
#   * these are converted from 'memory' string attribute (including size unit)
#     defined in WDL task's runtime
# - time: time limit for a job in hour
# - gpu: specified gpu name or number of gpus (it's declared as String)

slurm-resource-param=-n 1 --ntasks-per-node=1 --cpus-per-task=${cpu} ${if defined(memory_mb) then "--mem=" else ""}${memory_mb}${if defined(memory_mb) then "M" else ""} ${if defined(time) then "--time=" else ""}${time*60} ${if defined(gpu) then "--gres=gpu:" else ""}${gpu}

# If needed uncomment and define any extra SLURM sbatch parameters here
# YOU CANNOT USE WDL SYNTAX AND CROMWELL BUILT-IN VARIABLES HERE
#slurm-extra-param=

# Hashing strategy for call-caching (3 choices)
# This parameter is for local (local/slurm/sge/pbs/lsf) backend only.
# This is important for call-caching,
# which means re-using outputs from previous/failed workflows.
# Cache will miss if different strategy is used.
# "file" method has been default for all old versions of Caper<1.0.
# "path+modtime" is a new default for Caper>=1.0,
#   file: use md5sum hash (slow).
#   path: use path.
#   path+modtime: use path and modification time.
local-hash-strat=path+modtime

# Metadata DB for call-caching (reusing previous outputs):
# Cromwell supports restarting workflows based on a metadata DB
# DB is in-memory by default
db=in-memory

# If you use 'caper run' and want to use call-caching:
# Make sure to define different 'caper run ... --db file --file-db DB_PATH'
# for each pipeline run.
# But if you want to restart then define the same '--db file --file-db DB_PATH'
# then Caper will collect/re-use previous outputs without running the same task again
# Previous outputs will be simply hard/soft-linked.

# If you use 'caper server' then you can use one unified '--file-db'
# for all submitted workflows. In such case, uncomment the following two lines
# and defined file-db as an absolute path to store metadata of all workflows
#db=file
#file-db=

# Local directory for localized files and Cromwell's intermediate files
# If not defined, Caper will make .caper_tmp/ on local-out-dir or CWD.```
# /tmp is not recommended here since Caper store all localized data files
# on this directory (e.g. input FASTQs defined as URLs in input JSON).
local-loc-dir=

cromwell=/home/jsemple/.caper/cromwell_jar/cromwell-65.jar
womtool=/home/jsemple/.caper/womtool_jar/womtool-65.jar

Input JSON file

Paste contents of your input JSON file.

{"chip.title":"LET418_YA",
"chip.description":"Ahringer ChIP: LET-418_YA",
"chip.pipeline_type":"tf",
"chip.aligner":"bowtie2",
"chip.align_only":false,
"chip.true_rep_only":false,
"chip.genome_tsv":"/data/projects/p025/jenny/genome/ce11/ce11.tsv",
"chip.paired_end":false,
"chip.ctl_paired_end":false,
"chip.always_use_pooled_ctl":false,
"chip.fastqs_rep1_R1":["/data/projects/p025/jenny/shweta/tmpRun/SRR_download/LET-418_YA_rep1_IP.fq.gz"],
"chip.fastqs_rep2_R1":["/data/projects/p025/jenny/shweta/tmpRun/SRR_download/LET-418_YA_rep2_IP.fq.gz"],
"chip.ctl_fastqs_rep1_R1":["/data/projects/p025/jenny/shweta/tmpRun/SRR_download/LET-418_YA_rep1_input.fq.gz"],
"chip.ctl_fastqs_rep2_R1":["/data/projects/p025/jenny/shweta/tmpRun/SRR_download/LET-418_YA_rep2_input.fq.gz"]
}

Troubleshooting result

If you ran caper run without Caper server then Caper automatically runs a troubleshooter for failed workflows. Find troubleshooting result in the bottom of Caper's screen log.

If you ran caper submit with a running Caper server then first find your workflow ID (1st column) with caper list and run caper debug [WORKFLOW_ID].

Paste troubleshooting result. cromwell.out.txt

leepc12 commented 1 year ago

Found Killed in the error log.

/bin/bash: line 1:  3548 Killed                  Rscript --max-ppsize=500000 $(which run_spp.R) -c=/data/projects/p025/jenny/shweta/chip-seq-pipeline2/results/LET418_YA/chip/b8c9f35e-4f0b-4277-a384-6eef58480425/call-call_peak_pr2/shard-1/attempt-2/inputs/383861301/LET-418_YA_rep2_IP.srt.nodup.pr2.tagAlign.gz -i=/data/projects/p025/jenny/shweta/chip-seq-pipeline2/results/LET418_YA/chip/b8c9f35e-4f0b-4277-a384-6eef58480425/call-call_peak_pr2/shard-1/attempt-2/inputs/-1098189590/LET-418_YA_rep2_input.srt.nodup.tagAlign.gz -npeak=300000 -odir=/data/projects/p025/jenny/shweta/chip-seq-pipeline2/results/LET418_YA/chip/b8c9f35e-4f0b-4277-a384-6eef58480425/call-call_peak_pr2/shard-1/attempt-2/execution -speak=155 -savr=LET-418_YA_rep2_IP.srt.nodup.pr2_x_LET-418_YA_rep2_input.srt.nodup.300K.regionPeak.gz.tmp -fdr=0.01 -rf -p=6

Please check if your system has enough memory or run with caper run ... --max-concurrent-tasks 1 to disable parallelization.

jsemple19 commented 1 year ago

I am working on a SLURM cluster and can get lots of resource but i am not quite sure how to control the resources in this pipeline as cromwell launches jobs with different numbers of cpus etc that is completely independent of the resources i request in the launching sbatch script where i have the caper command.

I have tried the caper run ... --max-concurrent-tasks 1 option but that still gets the job killed.

I have tried increasing the amount of memory for spp in the chip.wdl file Float call_peak_spp_mem_factor = 16.0 I could increase it even more, but 16Gb is already much bigger than the tagAlign files (few hundred Mb at most) which seem to be the input to spp. Here is the cromwell output of my last failed attempt: cromwell.out.11.txt

And these are the resorces i requested in the submitting batch script:

#SBATCH --time=2-00:00:00
#SBATCH --cpus-per-task=1
#SBATCH --mem-per-cpu=16G

and i called caper with

caper run chip.wdl -i "${jsonFile}" --singularity --slurm-partition pall --slurm-account $USER --local-out-dir results/${grp} --str-label ${grp} --max-concurrent-tasks 1
leepc12 commented 1 year ago

New caper is out. Upgrade caper with pip install caper --upgrade first. And then make a backup of your ~/.caper/default.conf and run caper init slurm (this will overwrite on your existing conf file).

Follow instructions in ~/.caper/default.conf.

Please don't run caper run on HPCs, use caper hpc submit instead. This will submit a caper leader job which will actually submit (sbatch) each WDL-task to the cluster engine.

So you don't need a bash script with SLURM parameter comments. Run caper hpc list to check status of workflows.

jsemple19 commented 1 year ago

Upgrading caper got caper hpc submit to work, but i still had to increase spp resources even more (32 factor) to get it to eventually work. Thank so much for all your help!