Closed simingzhang closed 4 years ago
Please upload a file instead of pasting the whole contents. Please upload Caper's log or find troubleshooting result at the bottom of the log.
By Caper log, do you mean the result of caper troubleshoot? I get an error with caper troubleshoot metadata.json: HTTPConnectionPool(host='localhost', port=8000): Max retries exceeded with url: /api/workflows/v1/query?additionalQueryResultFields=labels (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x2b19e058a7f0>: Failed to establish a new connection: [Errno 111] Connection refused')) Help: cannot connect to server. Check if server is dead or still spinning up.
The metadata file is attached.
No, caper troubleshoot
only works with caper server
. I meant the shell log (STDOUT) of your caper run
command.
Slurm output file attached:
java.nio.file.NoSuchFileException: /oak/stanford/scg/lab_dflev/siming/chipseq/encode_analysis_pipeline/chipseq_nano3_test5/cromwell-workflow-logs/workflow.8b1e7e4e-e0a8-4328-9e2d-1b776b0b1b06.log
Found this in the log. Please don't run anything on /oak
. Even though your output directory defined in ~/.caper/default.conf
is not on /oak
.
Run it on /scratch
, $HOME
or $PI_HOME
instead.
Please upload this file.
/labs/dflev/siming/chipseq/encode_analysis_pipeline/chipseq_nano3_test5/chip/8b1e7e4e-e0a8-4328-9e2d-1b776b0b1b06/call-idr_pr/shard-0/attempt-2/execution/stdout
Sorry, what is $PI_HOME? is that /labs/dflev? is that different from /oak/stanford/scg/lab_dflev? And how do I avoid running on oak? Currently, I'm moving to the output directory and running the sbatch from there. The output directory I used was: /labs/dflev/siming/chipseq/encode_analysis_pipeline/chipseq_nano3_test5
Here's the stdout file: stdout.txt
I thought that you are working on Sherlock. Even on SCG, running anything on OAK is not good. Check OAK's disk.
$ df -h /oak
Check ANY_DIRECTORY is on the same disk as /oak
's.
$ df -h ANY_DIRECTORY
Use recommended scratch directory on SCG.
Please run this.
$ ls -l /labs/dflev/siming/chipseq/encode_analysis_pipeline/chipseq_nano3_test5/chip/8b1e7e4e-e0a8-4328-9e2d-1b776b0b1b06/call-idr_pr/shard-0/attempt-2/inputs/-723689832/hg38.blacklist.bed.gz
$ ls -l cd /labs/dflev/siming/chipseq/encode_analysis_pipeline/chipseq_nano3_test5/chip/8b1e7e4e-e0a8-4328-9e2d-1b776b0b1b06/call-idr_pr/shard-0/attempt-2/execution/
Result from first line: /labs/dflev/siming/chipseq/encode_analysis_pipeline/chipseq_nano3_test5/chip/8b1e7e4e-e0a8-4328-9e2d-1b776b0b1b06/call-idr_pr/shard-0/attempt-2/inputs/-723689832/hg38.blacklist.bed.gz -> /reference/ENCODE/pipeline_genome_data/hg38/hg38.blacklist.bed.gz
Result from second line (removed cd because it wasn't running): drwxrwsr-x 2 simingz scg_lab_dflev 4096 Apr 7 14:31 glob-156b027b56a0fc9230201b265d9f4bf4 -rw-rw-r-- 1 simingz scg_lab_dflev 0 Apr 7 14:31 glob-156b027b56a0fc9230201b265d9f4bf4.list drwxrwsr-x 2 simingz scg_lab_dflev 4096 Apr 7 14:31 glob-37a6259cc0c1dae299a7866489dff0bd -rw-rw-r-- 1 simingz scg_lab_dflev 5 Apr 7 14:31 glob-37a6259cc0c1dae299a7866489dff0bd.list drwxrwsr-x 2 simingz scg_lab_dflev 4096 Apr 7 14:31 glob-3c17d82667de5da0b56a7c17d88175d2 -rw-rw-r-- 1 simingz scg_lab_dflev 43 Apr 7 14:31 glob-3c17d82667de5da0b56a7c17d88175d2.list drwxrwsr-x 2 simingz scg_lab_dflev 4096 Apr 7 14:31 glob-6d5ecccaf6ecff88ff547bad03a90af6 -rw-rw-r-- 1 simingz scg_lab_dflev 0 Apr 7 14:31 glob-6d5ecccaf6ecff88ff547bad03a90af6.list drwxrwsr-x 2 simingz scg_lab_dflev 4096 Apr 7 14:31 glob-748df5c4b6fa48c407f98a93d42e64c6 -rw-rw-r-- 1 simingz scg_lab_dflev 33 Apr 7 14:31 glob-748df5c4b6fa48c407f98a93d42e64c6.list drwxrwsr-x 2 simingz scg_lab_dflev 4096 Apr 7 14:31 glob-8da83e7748d9e54f3e082eb4aa171757 -rw-rw-r-- 1 simingz scg_lab_dflev 57 Apr 7 14:31 glob-8da83e7748d9e54f3e082eb4aa171757.list drwxrwsr-x 2 simingz scg_lab_dflev 4096 Apr 7 14:31 glob-b34256f7f8497ceb77fad19f39663af0 -rw-rw-r-- 1 simingz scg_lab_dflev 56 Apr 7 14:31 glob-b34256f7f8497ceb77fad19f39663af0.list drwxrwsr-x 2 simingz scg_lab_dflev 4096 Apr 7 14:31 glob-c71d4cca8627aab27dc3503b7db7a39d -rw-rw-r-- 1 simingz scg_lab_dflev 0 Apr 7 14:31 glob-c71d4cca8627aab27dc3503b7db7a39d.list drwxrwsr-x 2 simingz scg_lab_dflev 4096 Apr 7 14:31 glob-dab68893df43168f2b301266b4498136 -rw-rw-r-- 1 simingz scg_lab_dflev 49 Apr 7 14:31 glob-dab68893df43168f2b301266b4498136.list -rw-rw-r-- 1 simingz scg_lab_dflev 901 Apr 7 14:30 hg38.blacklist.bed.tmp2 -rw-rw-r-- 2 simingz scg_lab_dflev 0 Apr 7 14:30 null -rw-rw-r-- 1 simingz scg_lab_dflev 2 Apr 7 14:30 rc -rw-rw-r-- 2 simingz scg_lab_dflev 20 Apr 7 14:30 rep1-pr1_vs_rep1-pr2.idr0.05.bfilt.regionPeak.gz -rw-rw-r-- 2 simingz scg_lab_dflev 197 Apr 7 14:30 rep1-pr1_vs_rep1-pr2.idr0.05.log -rw-rw-r-- 2 simingz scg_lab_dflev 350 Apr 7 14:30 rep1-pr1_vs_rep1-pr2.idr0.05.regionPeak.gz -rw-rw-r-- 1 simingz scg_lab_dflev 1101 Apr 7 14:30 rep1-pr1_vs_rep1-pr2.idr0.05.regionPeak.tmp1 -rw-rw-r-- 2 simingz scg_lab_dflev 1381 Apr 7 14:30 rep1-pr1_vs_rep1-pr2.idr0.05.unthresholded-peaks.txt.gz -rw-rw-r-- 2 simingz scg_lab_dflev 64019 Apr 7 14:30 rep1-pr1_vs_rep1-pr2.idr0.05.unthresholded-peaks.txt.png -rw-rw-r-- 1 simingz scg_lab_dflev 26350 Apr 7 14:30 script -rw-rw-r-- 1 simingz scg_lab_dflev 1593 Apr 7 14:30 script.submit -rw-rw-r-- 1 simingz scg_lab_dflev 1859 Apr 7 14:31 stderr -rw-rw-r-- 1 simingz scg_lab_dflev 0 Apr 7 14:30 stderr.submit -rw-rw-r-- 1 simingz scg_lab_dflev 6028 Apr 7 14:30 stdout -rw-rw-r-- 1 simingz scg_lab_dflev 29 Apr 7 14:30 stdout.submit
-rw-rw-r-- 2 simingz scg_lab_dflev 20 Apr 7 14:30 rep1-pr1_vs_rep1-pr2.idr0.05.bfilt.regionPeak.gz
-rw-rw-r-- 2 simingz scg_lab_dflev 197 Apr 7 14:30 rep1-pr1_vs_rep1-pr2.idr0.05.log
-rw-rw-r-- 2 simingz scg_lab_dflev 350 Apr 7 14:30 rep1-pr1_vs_rep1-pr2.idr0.05.regionPeak.gz
IDR peaks are not empty but after blacklist filtering it's empty. Something went wrong in the blacklist-filtering command line.
Try this.
$ cd /labs/dflev/siming/chipseq/encode_analysis_pipeline/chipseq_nano3_test5/chip/8b1e7e4e-e0a8-4328-9e2d-1b776b0b1b06/call-idr_pr/shard-0/attempt-2/execution/
$ bedtools --version
$ source activate encode-chip-seq-pipeline
$ bedtools --version
$ bedtools intersect -nonamecheck -v -a rep1-pr1_vs_rep1-pr2.idr0.05.regionPeak.tmp1 -b hg38.blacklist.bed.tmp2 | awk 'BEGIN{OFS="\t"} {if ($5>1000) $5=1000; print $0}' | grep -P 'chr[\dXY]+\b' | gzip -nc > rep1-pr1_vs_rep1-pr2.idr0.05.bfilt.regionPeak.gz
See what kind of error happens. Also check bedtools version.
I have no bedtools module loaded outside conda.
I ran:
conda activate encode-chip-seq-pipeline
bedtools --version
Result: bedtools v2.29.0
Running:
source activate encode-chip-seq-pipeline
activate: No such file or directory
(either with conda activate or without conda activate leads to same error):
Running:
bedtools intersect -nonamecheck -v -a rep1-pr1_vs_rep1-pr2.idr0.05.regionPeak.tmp1 -b hg38.blacklist.bed.tmp2 | awk 'BEGIN{OFS="\t"} {if ($5>1000) $5=1000; print $0}' | grep -P 'chr[\dXY]+\b' | gzip -nc > rep1-pr1_vs_rep1-pr2.idr0.05.bfilt.regionPeak.gz
No error but the output file seems empty (attached)
Can I see first few lines of rep1-pr1_vs_rep1-pr2.idr0.05.regionPeak.gz
? You can run this on the execution directory.
$ zcat -f rep1-pr1_vs_rep1-pr2.idr0.05.bfilt.regionPeak.gz | head
Do your sample's chromosome names start with chr
(e.g. chr1
)?
I don't see any output after the command zcat -f rep1-pr1_vs_rep1-pr2.idr0.05.bfilt.regionPeak.gz | head
. It looks empty.
The upstream file rep1-pr1_vs_rep1-pr2.idr0.05.regionPeak.tmp1 also looks really short, only has some chrM lines.
Oh... sorry. Try this one (without .bfilt.
).
$ zcat -f rep1-pr1_vs_rep1-pr2.idr0.05.regionPeak.gz | head -n30
chrM 10030 10272 . 1000 . 164.85611 -1.00000 2.38917 142 chrM 16098 16338 . 1000 . 135.03608 -1.00000 2.38917 109 chrM 14010 14406 . 1000 . 127.83353 -1.00000 2.38917 207 chrM 5293 5469 . 1000 . 124.09686 -1.00000 2.38917 135 chrM 15452 15716 . 1000 . 121.85084 -1.00000 2.38917 107 chrM 12300 12477 . 1000 . 120.89155 -1.00000 2.38917 113 chrM 4943 5097 . 1000 . 116.57318 -1.00000 2.38917 49 chrM 10814 11017 . 1000 . 89.55153 -1.00000 2.38917 83 chrM 11951 12053 . 1000 . 79.15492 -1.00000 2.38917 55 chrM 8524 8780 . 1000 . 75.25766 -1.00000 2.38917 128 chrM 2084 2340 . 1000 . 70.60579 -1.00000 2.38917 128 chrM 329 585 . 1000 . 61.20675 -1.00000 2.38917 128 chrM 6416 6672 . 1000 . 51.26232 -1.00000 2.38917 128 chrM 4247 4333 . 1000 . 47.64323 -1.00000 2.38917 60 chrM 14512 14695 . 1000 . 46.91356 -1.00000 2.38917 110 chrM 13724 13857 . 1000 . 44.64217 -1.00000 2.38917 58 chrM 13039 13295 . 1000 . 37.65616 -1.00000 2.38917 128 chrM 3247 3503 . 1000 . 32.19942 -1.00000 2.38917 128 chrM 10957 11131 . 1000 . 28.63342 -1.00000 2.38917 150 chrM 3451 3707 . 1000 . 8.37660 -1.00000 2.38917 128
You don't have any reads but mitochondrial ones. Pipeline filters out all mitochondrial reads in .bfilt.
peaks after blacklist-filtering, which resulted in an empty .bfilt.
peak file.
If you want to keep all reads including mito ones, define this in your input JSON.
{
"chip.regex_bfilt_peak_chr_name": "chr[\\dMXY]+"
}
Closing due to long inactivity.
Describe the bug I'm using the newest pipeline version and updated caper within conda to 0.8.2. My script: conda activate encode-chip-seq-pipeline cd $OUTPUT_DIR sbatch -A aeurban -J chipseq --export=ALL --mem 3G -t 7-0 --wrap "caper run /labs/dflev/siming/software/chip-seq-pipeline2/chip.wdl -i $INPUT --out-dir $OUTPUT_DIR"
There is an error at chip.overlap_pr step. "chip.overlap_pr": [ { "retryableFailure": true, "executionStatus": "RetryableFailure",
OS/Platform
Caper configuration file Paste contents of
~/.caper/default.conf
.backend=slurm slurm-account=aeurban
tmp-dir=
cromwell=/home/simingz/.caper/cromwell_jar/cromwell-47.jar womtool=/home/simingz/.caper/womtool_jar/womtool-47.jar
Input JSON file Paste contents of your input JSON file.
{ "chip.title" : "chipseq miseq nano", "chip.description" : "chip-seq test miseq nano ipsc", "chip.pipeline_type" : "tf", "chip.aligner" : "bowtie2", "chip.align_only" : false, "chip.true_rep_only" : false, "chip.genome_tsv" : "/reference/ENCODE/pipeline_genome_data/genome_tsv/v1/hg38_scg.tsv", "chip.paired_end" : true, "chip.ctl_paired_end" : true, "chip.fastqs_rep1_R1" : [ "/labs/dflev/siming/chipseq/raw_fastq/test_miseqnano/0425-3-cetch1-flag_subsample1_R1.fastq.gz" ], "chip.fastqs_rep1_R2" : [ "/labs/dflev/siming/chipseq/raw_fastq/test_miseqnano/0425-3-cetch1-flag_subsample1_R2.fastq.gz" ], "chip.ctl_fastqs_rep1_R1" : [ "/labs/dflev/siming/chipseq/raw_fastq/test_miseqnano/0425-3-cetch1-input_subsample1_R1.fastq.gz" ], "chip.ctl_fastqs_rep1_R2" : [ "/labs/dflev/siming/chipseq/raw_fastq/test_miseqnano/0425-3-cetch1-input_subsample1_R2.fastq.gz" ], "chip.crop_length" : 0, "chip.mapq_thresh" : 30, "chip.dup_marker" : "picard", "chip.no_dup_removal" : false, "chip.subsample_reads" : 0, "chip.ctl_subsample_reads" : 0, "chip.xcor_subsample_reads" : 15000000, "chip.xcor_trim_bp" : 50, "chip.use_filt_pe_ta_for_xcor" : false, "chip.always_use_pooled_ctl" : false, "chip.ctl_depth_ratio" : 1.2, "chip.peak_caller" : null, "chip.cap_num_peak_macs2" : 500000, "chip.pval_thresh" : 0.01, "chip.fdr_thresh" : 0.01, "chip.idr_thresh" : 0.05, "chip.cap_num_peak_spp" : 300000, "chip.enable_jsd" : true, "chip.enable_gc_bias" : true, "chip.enable_count_signal_track" : false, "chip.filter_chrs" : [], "chip.align_cpu" : 4, "chip.align_mem_mb" : 20000, "chip.align_time_hr" : 144, "chip.align_disks" : "local-disk 400 HDD", "chip.filter_cpu" : 2, "chip.filter_mem_mb" : 20000, "chip.filter_time_hr" : 144, "chip.filter_disks" : "local-disk 400 HDD", "chip.bam2ta_cpu" : 2, "chip.bam2ta_mem_mb" : 10000, "chip.bam2ta_time_hr" : 144, "chip.bam2ta_disks" : "local-disk 100 HDD", "chip.spr_mem_mb" : 16000, "chip.jsd_cpu" : 2, "chip.jsd_mem_mb" : 12000, "chip.jsd_time_hr" : 144, "chip.jsd_disks" : "local-disk 200 HDD", "chip.xcor_cpu" : 2, "chip.xcor_mem_mb" : 16000, "chip.xcor_time_hr" : 144, "chip.xcor_disks" : "local-disk 100 HDD", "chip.call_peak_cpu" : 2, "chip.call_peak_mem_mb" : 16000, "chip.call_peak_time_hr" : 144, "chip.call_peak_disks" : "local-disk 200 HDD", "chip.macs2_signal_track_mem_mb" : 16000, "chip.macs2_signal_track_time_hr" : 144, "chip.macs2_signal_track_disks" : "local-disk 400 HDD", "chip.gc_bias_picard_java_heap" : "10G" }
Error log Caper automatically runs a troubleshooter for failed workflows. If it doesn't then get a
WORKFLOW_ID
of your failed workflow withcaper list
or directly use ametadata.json
file on Caper's output directory.I can't get the caper debug or troubleshoot to work (get message. Help: cannot connect to server. Check if server is dead or still spinning up.). Instead, I attached the metadata.json file below:
metadata.txt
The failed parts are here: { "workflowName": "chip", "workflowProcessingEvents": [ { "cromwellId": "cromid-242b9e3", "description": "PickedUp", "timestamp": "2020-04-07T20:46:51.509Z", "cromwellVersion": "47" }, { "cromwellId": "cromid-242b9e3", "description": "Finished", "timestamp": "2020-04-07T21:31:11.658Z", "cromwellVersion": "47" } ], "actualWorkflowLanguageVersion": "draft-2", "submittedFiles": { "workflow": "# ENCODE TF/Histone ChIP-Seq pipeline\n# Author: Jin Lee (leepc12@gmail.com)\n\n#CAPER docker quay.io/encode-dcc/chip-seq-pipeline:v1.3.6\n#CAPER singularity docker://quay.io/encode-dcc/chip-seq-pipeline:v1.3.6\n#CROO out_def https://storage.googleapis.com/encode-pipeline-output-definition/chip.croo.v3.json\n\nworkflow chip {\n\tString pipeline_ver = 'v1.3.6'\n\t### sample name, description\n\tString title = 'Untitled'\n\tString description = 'No description'\n\n\t# endedness for input data\n\tBoolean? paired_end\t\t\t\t# to define endedness for all replciates\n\t\t\t\t\t\t\t\t\t#\tif defined, this will override individual endedness below\n\tArray[Boolean] paired_ends = []\t# to define endedness for individual replicate\n\tBoolean? ctl_paired_end\n\tArray[Boolean] ctl_paired_ends = []\n\n\t### mandatory genome param\n\tFile? genome_tsv \t\t\t\t# reference genome data TSV file including\n\t\t\t\t\t\t\t\t\t# all genome-specific file paths and parameters\n\t# individual genome parameters\n\tString? genome_name\t\t\t\t# genome name\n\tFile? ref_fa\t\t\t\t\t# reference fasta (*.fa.gz)\n\tFile? bwa_idx_tar \t\t\t\t# bwa index tar (uncompressed .tar)\n\tFile? bowtie2_idx_tar \t\t\t# bowtie2 index tar (uncompressed .tar)\n\tFile? custom_aligner_idx_tar \t# custom aligner's index tar (uncompressed .tar)\n\tFile? chrsz \t\t\t\t\t# 2-col chromosome sizes file\n\tFile? blacklist \t\t\t\t# blacklist BED (peaks overlapping will be filtered out)\n\tFile? blacklist2 \t\t\t\t# 2nd blacklist (will be merged with 1st one)\n\tString? mito_chr_name\n\tString? regex_bfilt_peak_chr_name\n\tString? gensz \t\t\t\t\t# genome sizes (hs for human, mm for mouse or sum of 2nd col in chrsz)\n\tFile? tss \t\t\t\t\t\t# TSS BED file\n\tFile? dnase \t\t\t\t\t# open chromatin region BED file\n\tFile? prom \t\t\t\t\t\t# promoter region BED file\n\tFile? enh \t\t\t\t\t\t# enhancer region BED file\n\tFile? reg2map \t\t\t\t\t# file with cell type signals\n\tFile? reg2map_bed \t\t\t\t# file of regions used to generate reg2map signals\n\tFile? roadmap_meta \t\t\t\t# roadmap metedata\n\n\t### pipeline type\n\tString pipeline_type \t\t\t# tf or histone chip-seq\n\n\tString aligner = 'bowtie2'\n\tFile? custom_align_py \t\t\t# custom align python script\n\n\tString? peak_caller \t\t\t# default: (spp for tf) and (macs2 for histone)\n\t\t\t\t\t\t\t\t\t# this will override the above defaults\n\tString? peak_type \t\t\t\t# default: narrowPeak for macs2, regionPeak for spp\n\t\t\t\t\t\t\t\t\t# this will override the above defaults\n\tFile? custom_call_peak_py \t\t# custom call_peak python script\n\n\t## parameters for alignment\n\tBoolean align_only = false \t\t# disable all post-align analysis (peak-calling, overlap, idr, ...)\n\tBoolean true_rep_only = false \t# disable all analyses involving pseudo replicates (including overlap/idr)\n\tBoolean enable_count_signal_track = false \t\t# generate count signal track\n\tBoolean enable_jsd = true \t\t# enable JSD plot generation (deeptools fingerprint)\n\tBoolean enable_gc_bias = true\n\n\t# parameters for aligner and filter\n\tBoolean use_bwa_mem_for_pe = false # THIS IS EXPERIMENTAL and BWA ONLY (use bwa mem instead of bwa aln/sam)\n\t\t\t\t\t\t\t\t\t# available only for PE dataset with READ_LEN>=70bp\n\tInt crop_length = 0 \t\t\t# crop reads in FASTQs with Trimmomatic (0 by default, i.e. disabled)\n\tInt xcor_trim_bp = 50 \t\t\t# for cross-correlation analysis only (R1 of paired-end fastqs)\n\tBoolean use_filt_pe_ta_for_xcor = false # PE only. use filtered PE BAM for cross-corr.\n\tString dup_marker = 'picard'\t# picard, sambamba\n\tBoolean no_dup_removal = false\t# keep all dups in final BAM\n\tInt? mapq_thresh \t\t\t\t# threshold for low MAPQ reads removal\n\tInt mapq_thresh_bwa = 30\n\tInt mapq_thresh_bowtie2 = 30\n\tArray[String] filter_chrs = [] \t# array of chromosomes to be removed from nodup/filt BAM\n\t\t\t\t\t\t\t\t\t# chromosomes will be removed from both BAM header/contents\n\t\t\t\t\t\t\t\t\t# e.g. ['chrM', 'MT']\n\tInt subsample_reads = 0\t\t\t# number of reads to subsample TAGALIGN\n\t\t\t\t\t\t\t\t\t# 0 for no subsampling. this affects all downstream analysis\n\tInt ctl_subsample_reads = 0\t\t# number of reads to subsample control TAGALIGN\n\tInt xcor_subsample_reads = 15000000 # subsample TAG-ALIGN for xcor only (not used for other downsteam analyses)\n\tInt xcor_exclusion_range_min = -500\n\tInt? xcor_exclusion_range_max\n\n\t# parameters for peak calling\n\tBoolean always_use_pooled_ctl = false # always use pooled control for all exp rep.\n\tFloat ctl_depth_ratio = 1.2 \t# if ratio between controls is higher than this\n\t\t\t\t\t\t\t\t\t# then always use pooled control for all exp rep.\n\tInt? cap_num_peak\n\tInt cap_num_peak_spp = 300000\t# cap number of raw peaks called from SPP\n\tInt cap_num_peak_macs2 = 500000\t# cap number of raw peaks called from MACS2\n\tFloat pval_thresh = 0.01\t\t# p.value threshold (for MACS2 peak caller only)\n\tFloat fdr_thresh = 0.01\t\t\t# FDR threshold (for SPP peak caller only: Rscript run_spp.R -fdr)\n\tFloat idr_thresh = 0.05\t\t\t# IDR threshold\n\n\t### resources\n\tInt align_cpu = 4\n\tInt align_mem_mb = 20000\n\tInt align_time_hr = 48\n\tString align_disks = 'local-disk 400 HDD'\n\n\tInt filter_cpu = 2\n\tInt filter_mem_mb = 20000\n\tInt filter_time_hr = 24\n\tString filter_disks = 'local-disk 400 HDD'\n\n\tInt bam2ta_cpu = 2\n\tInt bam2ta_mem_mb = 10000\n\tInt bam2ta_time_hr = 6\n\tString bam2ta_disks = 'local-disk 100 HDD'\n\n\tInt spr_mem_mb = 16000\n\n\tInt jsd_cpu = 2\n\tInt jsd_mem_mb = 12000\n\tInt jsd_time_hr = 6\n\tString jsd_disks = 'local-disk 200 HDD'\n\n\tInt xcor_cpu = 2\n\tInt xcor_mem_mb = 16000\t\n\tInt xcor_time_hr = 24\n\tString xcor_disks = 'local-disk 100 HDD'\n\n\tInt macs2_signal_track_mem_mb = 16000\n\tInt macs2_signal_track_time_hr = 24\n\tString macs2_signal_track_disks = 'local-disk 400 HDD'\n\n\tInt call_peak_cpu = 2\n\tInt call_peak_mem_mb = 16000\n\tInt call_peak_time_hr = 72\n\tString call_peak_disks = 'local-disk 200 HDD'\n\n\tString? align_trimmomatic_java_heap\n\tString? filter_picard_java_heap\n\tString? gc_bias_picard_java_heap\n\n\t#### input file definition\n\t# pipeline can start from any type of inputs and then leave all other types undefined\n\t# supported types: fastq, bam, nodup_bam (filtered bam), ta (tagAlign), peak\n\t# define up to 4 replicates\n\t# [rep_id] is for each replicate\n\n \t### fastqs\n\tArray[File] fastqs_rep1_R1 = []\t\t# FASTQs to be merged for rep1 R1\n\tArray[File] fastqs_rep1_R2 = [] \t# do not define if single-ended\n\tArray[File] fastqs_rep2_R1 = [] \t# do not define if unreplicated\n\tArray[File] fastqs_rep2_R2 = []\t\t# ...\n\tArray[File] fastqs_rep3_R1 = []\n\tArray[File] fastqs_rep3_R2 = []\n\tArray[File] fastqs_rep4_R1 = []\n\tArray[File] fastqs_rep4_R2 = []\n\tArray[File] fastqs_rep5_R1 = []\n\tArray[File] fastqs_rep5_R2 = []\n\tArray[File] fastqs_rep6_R1 = []\n\tArray[File] fastqs_rep6_R2 = []\n\tArray[File] fastqs_rep7_R1 = []\n\tArray[File] fastqs_rep7_R2 = []\n\tArray[File] fastqs_rep8_R1 = []\n\tArray[File] fastqs_rep8_R2 = []\n\tArray[File] fastqs_rep9_R1 = []\n\tArray[File] fastqs_rep9_R2 = []\n\tArray[File] fastqs_rep10_R1 = []\n\tArray[File] fastqs_rep10_R2 = []\n\n\tArray[File] ctl_fastqs_rep1_R1 = []\t\t# Control FASTQs to be merged for rep1 R1\n\tArray[File] ctl_fastqs_rep1_R2 = [] \t# do not define if single-ended\n\tArray[File] ctl_fastqs_rep2_R1 = [] \t# do not define if unreplicated\n\tArray[File] ctl_fastqs_rep2_R2 = []\t\t# ...\n\tArray[File] ctl_fastqs_rep3_R1 = []\n\tArray[File] ctl_fastqs_rep3_R2 = []\n\tArray[File] ctl_fastqs_rep4_R1 = []\n\tArray[File] ctl_fastqs_rep4_R2 = []\n\tArray[File] ctl_fastqs_rep5_R1 = []\n\tArray[File] ctl_fastqs_rep5_R2 = []\n\tArray[File] ctl_fastqs_rep6_R1 = []\n\tArray[File] ctl_fastqs_rep6_R2 = []\n\tArray[File] ctl_fastqs_rep7_R1 = []\n\tArray[File] ctl_fastqs_rep7_R2 = []\n\tArray[File] ctl_fastqs_rep8_R1 = []\n\tArray[File] ctl_fastqs_rep8_R2 = []\n\tArray[File] ctl_fastqs_rep9_R1 = []\n\tArray[File] ctl_fastqs_rep9_R2 = []\n\tArray[File] ctl_fastqs_rep10_R1 = []\n\tArray[File] ctl_fastqs_rep10_R2 = []\n\n\t### other input types (bam, nodup_bam, ta)\n\tArray[File?] bams = [] \t\t\t# [rep_id]\n\tArray[File?] ctl_bams = [] \t\t# [rep_id]\n\tArray[File?] nodup_bams = [] \t# [rep_id]\n\tArray[File?] ctl_nodup_bams = [] # [rep_id]\n\tArray[File?] tas = []\t\t\t# [rep_id]\n\tArray[File?] ctl_tas = []\t\t# [rep_id]\n\n\t### other input types (peak)\n\tArray[Int?] fraglen = [] \t# [rep_id]. fragment length if inputs are peaks\n\tArray[File?] peaks = []\t\t# [PAIR(rep_id1,rep_id2)]. example for 3 reps: [rep1_rep2, rep1_rep3, rep2_rep3]\n\tArray[File?] peaks_pr1 = []\t# [rep_id]. do not define if true_rep=true\n\tArray[File?] peaks_pr2 = []\t# [rep_id]. do not define if true_rep=true\n\tFile? peak_ppr1\t\t\t\t# do not define if you have a single replicate or true_rep=true\n\tFile? peak_ppr2\t\t\t\t# do not define if you have a single replicate or true_rep=true\n\tFile? peak_pooled\t\t\t# do not define if you have a single replicate or true_rep=true\n\n\t####################### pipeline starts here #######################\n\t# DO NOT DEFINE ANY VARIABLES DECLARED BELOW IN AN INPUT JSON FILE #\n\t# THEY ARE TEMPORARY/INTERMEDIATE SYSTEM VARIABLES #\n\t####################### pipeline starts here #######################\n\n\t# read genome data and paths\n\tif ( defined(genome_tsv) ) {\n\t\tcall read_genome_tsv { input: genome_tsv = genome_tsv }\n\t}\n\tFile? reffa = if defined(ref_fa) then ref_fa\n\t\telse read_genome_tsv.ref_fa\n\tFile? bwa_idxtar = if defined(bwa_idx_tar) then bwa_idx_tar\n\t\telse read_genome_tsv.bwa_idx_tar\n\tFile? bowtie2_idxtar = if defined(bowtie2_idx_tar) then bowtie2_idx_tar\n\t\telse read_genome_tsv.bowtie2_idx_tar\n\tFile? custom_aligner_idxtar = if defined(custom_aligner_idx_tar) then custom_aligner_idx_tar\n\t\telse read_genome_tsv.custom_aligner_idxtar\n\tFile? chrsz = if defined(chrsz) then chrsz\n\t\telse read_genometsv.chrsz\n\tString? gensz = if defined(gensz) then gensz\n\t\telse read_genometsv.gensz\n\tFile? blacklist1 = if defined(blacklist) then blacklist\n\t\telse read_genometsv.blacklist\n\tFile? blacklist2 = if defined(blacklist2) then blacklist2\n\t\telse read_genome_tsv.blacklist2\n\t# merge multiple blacklists\n\t# two blacklists can have different number of columns (3 vs 6)\n\t# so we limit merged blacklist's columns to 3\n\tArray[File] blacklists = selectall([blacklist1, blacklist2_])\n\tif ( length(blacklists) > 1 ) {\n\t\tcall pool_ta as poolblacklist { input:\n\t\t\ttas = blacklists,\n\t\t\tcol = 3,\n\t\t}\n\t}\n\tFile? blacklist = if length(blacklists) > 1 then pool_blacklist.tapooled\n\t\telse if length(blacklists) > 0 then blacklists[0]\n\t\telse blacklist2\n\tString? mito_chrname = if defined(mito_chr_name) then mito_chr_name\n\t\telse read_genome_tsv.mito_chr_name\t\t\n\tString? regex_bfilt_peak_chrname = if defined(regex_bfilt_peak_chr_name) then regex_bfilt_peak_chr_name\n\t\telse read_genome_tsv.regex_bfilt_peak_chr_name\n\tString? genomename = if defined(genome_name) then genome_name\n\t\telse if defined(read_genome_tsv.genome_name) then read_genome_tsv.genome_name\n\t\telse basename(select_first([genome_tsv, reffa, chrsz, 'None']))\n\n\t# read additional annotation data\n\tFile? tss = if defined(tss) then tss\n\t\telse read_genometsv.tss\n\tFile? dnase = if defined(dnase) then dnase\n\t\telse read_genometsv.dnase\n\tFile? prom = if defined(prom) then prom\n\t\telse read_genometsv.prom\n\tFile? enh = if defined(enh) then enh\n\t\telse read_genometsv.enh\n\tFile? reg2map = if defined(reg2map) then reg2map\n\t\telse read_genome_tsv.reg2map\n\tFile? reg2mapbed = if defined(reg2map_bed) then reg2map_bed\n\t\telse read_genome_tsv.reg2map_bed\n\tFile? roadmapmeta = if defined(roadmap_meta) then roadmap_meta\n\t\telse read_genome_tsv.roadmapmeta\n\n\t### temp vars (do not define these)\n\tString aligner = if defined(custom_align_py) then 'custom' else aligner\n\tString peakcaller = if defined(custom_call_peak_py) then 'custom'\n\t\t\t\t\t\telse if pipeline_type=='tf' then select_first([peak_caller, 'spp'])\n\t\t\t\t\t\telse select_first([peak_caller, 'macs2'])\n\tString peaktype = if peakcaller=='spp' then select_first([peak_type, 'regionPeak'])\n\t\t\t\t\t\telse if peakcaller=='macs2' then select_first([peak_type, 'narrowPeak'])\n\t\t\t\t\t\telse select_first([peak_type, 'narrowPeak'])\n\tBoolean enable_idr = pipeline_type=='tf' # enable_idr for TF chipseq only\n\tString idrrank = if peakcaller=='spp' then 'signal.value'\n\t\t\t\t\t\telse if peakcaller=='macs2' then 'p.value'\n\t\t\t\t\t\telse 'p.value'\n\tInt cap_numpeak = if peakcaller == 'spp' then select_first([cap_num_peak, cap_num_peak_spp])\n\t\telse select_first([cap_num_peak, cap_num_peak_macs2])\n\tInt mapqthresh = if aligner=='bowtie2' then select_first([mapq_thresh, mapq_thresh_bowtie2])\n\t\t\t\t\t\telse select_first([mapq_thresh, mapq_thresh_bwa])\n\n\t# temporary 2-dim fastqs array [rep_id][merge_id]\n\tArray[Array[File]] fastqs_R1 = \n\t\tif length(fastqs_rep10_R1)>0 then\n\t\t\t[fastqs_rep1_R1, fastqs_rep2_R1, fastqs_rep3_R1, fastqs_rep4_R1, fastqs_rep5_R1,\n\t\t\tfastqs_rep6_R1, fastqs_rep7_R1, fastqs_rep8_R1, fastqs_rep9_R1, fastqs_rep10_R1]\n\t\telse if length(fastqs_rep9_R1)>0 then\n\t\t\t[fastqs_rep1_R1, fastqs_rep2_R1, fastqs_rep3_R1, fastqs_rep4_R1, fastqs_rep5_R1,\n\t\t\tfastqs_rep6_R1, fastqs_rep7_R1, fastqs_rep8_R1, fastqs_rep9_R1]\n\t\telse if length(fastqs_rep8_R1)>0 then\n\t\t\t[fastqs_rep1_R1, fastqs_rep2_R1, fastqs_rep3_R1, fastqs_rep4_R1, fastqs_rep5_R1,\n\t\t\tfastqs_rep6_R1, fastqs_rep7_R1, fastqs_rep8_R1]\n\t\telse if length(fastqs_rep7_R1)>0 then\n\t\t\t[fastqs_rep1_R1, fastqs_rep2_R1, fastqs_rep3_R1, fastqs_rep4_R1, fastqs_rep5_R1,\n\t\t\tfastqs_rep6_R1, fastqs_rep7_R1]\n\t\telse if length(fastqs_rep6_R1)>0 then\n\t\t\t[fastqs_rep1_R1, fastqs_rep2_R1, fastqs_rep3_R1, fastqs_rep4_R1, fastqs_rep5_R1,\n\t\t\tfastqs_rep6_R1]\n\t\telse if length(fastqs_rep5_R1)>0 then\n\t\t\t[fastqs_rep1_R1, fastqs_rep2_R1, fastqs_rep3_R1, fastqs_rep4_R1, fastqs_rep5_R1]\n\t\telse if length(fastqs_rep4_R1)>0 then\n\t\t\t[fastqs_rep1_R1, fastqs_rep2_R1, fastqs_rep3_R1, fastqs_rep4_R1]\n\t\telse if length(fastqs_rep3_R1)>0 then\n\t\t\t[fastqs_rep1_R1, fastqs_rep2_R1, fastqs_rep3_R1]\n\t\telse if length(fastqs_rep2_R1)>0 then\n\t\t\t[fastqs_rep1_R1, fastqs_rep2_R1]\n\t\telse if length(fastqs_rep1_R1)>0 then\n\t\t\t[fastqs_rep1_R1]\n\t\telse []\n\t# no need to do that for R2 (R1 array will be used to determine presense of fastq for each rep)\n\tArray[Array[File]] fastqs_R2 = \n\t\t[fastqs_rep1_R2, fastqs_rep2_R2, fastqs_rep3_R2, fastqs_rep4_R2, fastqs_rep5_R2,\n\t\tfastqs_rep6_R2, fastqs_rep7_R2, fastqs_rep8_R2, fastqs_rep9_R2, fastqs_rep10_R2]\n\n\t# temporary 2-dim ctl fastqs array [rep_id][merge_id]\n\tArray[Array[File]] ctl_fastqs_R1 = \n\t\tif length(ctl_fastqs_rep10_R1)>0 then\n\t\t\t[ctl_fastqs_rep1_R1, ctl_fastqs_rep2_R1, ctl_fastqs_rep3_R1, ctl_fastqs_rep4_R1, ctl_fastqs_rep5_R1,\n\t\t\tctl_fastqs_rep6_R1, ctl_fastqs_rep7_R1, ctl_fastqs_rep8_R1, ctl_fastqs_rep9_R1, ctl_fastqs_rep10_R1]\n\t\telse if length(ctl_fastqs_rep9_R1)>0 then\n\t\t\t[ctl_fastqs_rep1_R1, ctl_fastqs_rep2_R1, ctl_fastqs_rep3_R1, ctl_fastqs_rep4_R1, ctl_fastqs_rep5_R1,\n\t\t\tctl_fastqs_rep6_R1, ctl_fastqs_rep7_R1, ctl_fastqs_rep8_R1, ctl_fastqs_rep9_R1]\n\t\telse if length(ctl_fastqs_rep8_R1)>0 then\n\t\t\t[ctl_fastqs_rep1_R1, ctl_fastqs_rep2_R1, ctl_fastqs_rep3_R1, ctl_fastqs_rep4_R1, ctl_fastqs_rep5_R1,\n\t\t\tctl_fastqs_rep6_R1, ctl_fastqs_rep7_R1, ctl_fastqs_rep8_R1]\n\t\telse if length(ctl_fastqs_rep7_R1)>0 then\n\t\t\t[ctl_fastqs_rep1_R1, ctl_fastqs_rep2_R1, ctl_fastqs_rep3_R1, ctl_fastqs_rep4_R1, ctl_fastqs_rep5_R1,\n\t\t\tctl_fastqs_rep6_R1, ctl_fastqs_rep7_R1]\n\t\telse if length(ctl_fastqs_rep6_R1)>0 then\n\t\t\t[ctl_fastqs_rep1_R1, ctl_fastqs_rep2_R1, ctl_fastqs_rep3_R1, ctl_fastqs_rep4_R1, ctl_fastqs_rep5_R1,\n\t\t\tctl_fastqs_rep6_R1]\n\t\telse if length(ctl_fastqs_rep5_R1)>0 then\n\t\t\t[ctl_fastqs_rep1_R1, ctl_fastqs_rep2_R1, ctl_fastqs_rep3_R1, ctl_fastqs_rep4_R1, ctl_fastqs_rep5_R1]\n\t\telse if length(ctl_fastqs_rep4_R1)>0 then\n\t\t\t[ctl_fastqs_rep1_R1, ctl_fastqs_rep2_R1, ctl_fastqs_rep3_R1, ctl_fastqs_rep4_R1]\n\t\telse if length(ctl_fastqs_rep3_R1)>0 then\n\t\t\t[ctl_fastqs_rep1_R1, ctl_fastqs_rep2_R1, ctl_fastqs_rep3_R1]\n\t\telse if length(ctl_fastqs_rep2_R1)>0 then\n\t\t\t[ctl_fastqs_rep1_R1, ctl_fastqs_rep2_R1]\n\t\telse if length(ctl_fastqs_rep1_R1)>0 then\n\t\t\t[ctl_fastqs_rep1_R1]\n\t\telse []\n\t# no need to do that for R2 (R1 array will be used to determine presense of fastq for each rep)\n\tArray[Array[File]] ctl_fastqs_R2 = \n\t\t[ctl_fastqs_rep1_R2, ctl_fastqs_rep2_R2, ctl_fastqs_rep3_R2, ctl_fastqs_rep4_R2, ctl_fastqs_rep5_R2,\n\t\tctl_fastqs_rep6_R2, ctl_fastqs_rep7_R2, ctl_fastqs_rep8_R2, ctl_fastqs_rep9_R2, ctl_fastqs_rep10_R2]\n\n\t# temporary variables to get number of replicates\n\t# \tWDLic implementation of max(A,B,C,...)\n\tInt num_rep_fastq = length(fastqs_R1)\n\tInt num_rep_bam = if length(bams)<num_rep_fastq then num_rep_fastq\n\t\telse length(bams)\n\tInt num_rep_nodup_bam = if length(nodup_bams)<num_rep_bam then num_rep_bam\n\t\telse length(nodup_bams)\n\tInt num_rep_ta = if length(tas)<num_rep_nodup_bam then num_rep_nodup_bam\n\t\telse length(tas)\n\tInt num_rep_peak = if length(peaks)<num_rep_ta then num_rep_ta\n\t\telse length(peaks)\n\tInt num_rep = num_rep_peak\n\n\t# temporary variables to get number of controls\n\tInt num_ctl_fastq = length(ctl_fastqs_R1)\n\tInt num_ctl_bam = if length(ctl_bams)<num_ctl_fastq then num_ctl_fastq\n\t\telse length(ctl_bams)\n\tInt num_ctl_nodup_bam = if length(ctl_nodup_bams)<num_ctl_bam then num_ctl_bam\n\t\telse length(ctl_nodup_bams)\n\tInt num_ctl_ta = if length(ctl_tas)<num_ctl_nodup_bam then num_ctl_nodup_bam\n\t\telse length(ctl_tas)\n\tInt num_ctl = num_ctl_ta\n\n\t# sanity check for inputs\n\tif ( num_rep == 0 && num_ctl == 0 ) {\n\t\tcall raise_exception as error_input_data { input:\n\t\t\tmsg = 'No FASTQ/BAM/TAG-ALIGN/PEAK defined in your input JSON. Check if your FASTQs are defined as \"chip.fastqs_repX_RY\". DO NOT MISS suffix R1 even for single ended FASTQ.'\n\t\t}\n\t}\n\tif ( !defined(chrsz) ) {\n\t\tcall raise_exception as error_genome_database { input:\n\t\t\tmsg = 'No genome database found in your input JSON. Did you define \"chip.genome_tsv\" correctly?'\n\t\t}\n\t}\n\tif ( !align_only && peakcaller == 'spp' && num_ctl == 0 ) {\n\t\tcall raise_exception as error_controlrequired { input:\n\t\t\tmsg = 'SPP requires control inputs. Define control input files (\"chip.ctl*\") in an input JSON file.'\n\t\t}\n\t}\n\n\t# align each replicate\n\tscatter(i in range(num_rep)) {\n\t\t# to override endedness definition for individual replicate\n\t\t# \tpaired_end will override paired_ends[i]\n\t\tBoolean? pairedend = if !defined(paired_end) && i<length(paired_ends) then paired_ends[i]\n\t\t\telse paired_end\n\n\t\tBoolean has_input_of_align = i<length(fastqs_R1) && length(fastqs_R1[i])>0\n\t\tBoolean has_output_of_align = i<length(bams) && defined(bams[i])\n\t\tif ( has_input_of_align && !has_output_of_align ) {\n\t\t\tcall align { input :\n\t\t\t\tfastqs_R1 = fastqs_R1[i],\n\t\t\t\tfastqs_R2 = fastqs_R2[i],\n\t\t\t\tcrop_length = croplength,\n\n\t\t\t\taligner = aligner,\n\t\t\t\tmito_chr_name = mito_chrname,\n\t\t\t\tcustom_align_py = custom_align_py,\n\t\t\t\tidx_tar = if aligner=='bwa' then bwa_idxtar\n\t\t\t\t\telse if aligner=='bowtie2' then bowtie2_idxtar\n\t\t\t\t\telse custom_aligner_idxtar,\n\t\t\t\tpaired_end = pairedend,\n\t\t\t\tuse_bwa_mem_for_pe = use_bwa_mem_for_pe,\n\n\t\t\t\ttrimmomatic_java_heap = align_trimmomatic_java_heap,\n\t\t\t\tcpu = align_cpu,\n\t\t\t\tmem_mb = align_mem_mb,\n\t\t\t\ttime_hr = align_time_hr,\n\t\t\t\tdisks = aligndisks,\n\t\t\t}\n\t\t}\n\t\tFile? bam = if has_output_of_align then bams[i] else align.bam\n\n\t\tBoolean has_input_of_filter = has_output_of_align || defined(align.bam)\n\t\tBoolean has_output_of_filter = i<length(nodup_bams) && defined(nodup_bams[i])\n\t\t# skip if we already have output of this step\n\t\tif ( has_input_of_filter && !has_output_offilter ) {\n\t\t\tcall filter { input :\n\t\t\t\tbam = bam,\n\t\t\t\tpaired_end = pairedend,\n\t\t\t\tdup_marker = dup_marker,\n\t\t\t\tmapq_thresh = mapqthresh,\n\t\t\t\tfilter_chrs = filterchrs,\n\t\t\t\tchrsz = chrsz,\n\t\t\t\tno_dup_removal = no_dup_removal,\n\t\t\t\tmito_chr_name = mito_chrname,\n\n\t\t\t\tcpu = filter_cpu,\n\t\t\t\tmem_mb = filter_mem_mb,\n\t\t\t\tpicard_java_heap = filter_picard_java_heap,\n\t\t\t\ttime_hr = filter_time_hr,\n\t\t\t\tdisks = filter_disks,\n\t\t\t}\n\t\t}\n\t\tFile? nodupbam = if has_output_of_filter then nodup_bams[i] else filter.nodup_bam\n\n\t\tBoolean has_input_of_bam2ta = has_output_of_filter || defined(filter.nodup_bam)\n\t\tBoolean has_output_of_bam2ta = i<length(tas) && defined(tas[i])\n\t\tif ( has_input_of_bam2ta && !has_output_of_bam2ta ) {\n\t\t\tcall bam2ta { input :\n\t\t\t\tbam = nodupbam,\n\t\t\t\tsubsample = subsample_reads,\n\t\t\t\tpaired_end = pairedend,\n\t\t\t\tmito_chr_name = mito_chrname,\n\n\t\t\t\tcpu = bam2ta_cpu,\n\t\t\t\tmem_mb = bam2ta_mem_mb,\n\t\t\t\ttime_hr = bam2ta_time_hr,\n\t\t\t\tdisks = bam2tadisks,\n\t\t\t}\n\t\t}\n\t\tFile? ta = if has_output_of_bam2ta then tas[i] else bam2ta.ta\n\n\t\tBoolean has_input_of_spr = has_output_of_bam2ta || defined(bam2ta.ta)\n\t\tif ( has_input_of_spr && !align_only && !true_reponly ) {\n\t\t\tcall spr { input :\n\t\t\t\tta = ta,\n\t\t\t\tpaired_end = pairedend,\n\t\t\t\tmem_mb = spr_mem_mb,\n\t\t\t}\n\t\t}\n\n\t\tBoolean has_input_of_count_signal_track = has_output_of_bam2ta || defined(bam2ta.ta)\n\t\tif ( has_input_of_count_signal_track && enable_count_signal_track ) {\n\t\t\t# generate count signal track\n\t\t\tcall count_signaltrack { input :\n\t\t\t\tta = ta,\n\t\t\t\tchrsz = chrsz_,\n\t\t\t}\n\t\t}\n\n\t\tif ( enable_gc_bias && defined(nodupbam) && defined(reffa) ) {\n\t\t\tcall gc_bias { input :\n\t\t\t\tnodup_bam = nodupbam,\n\t\t\t\tref_fa = reffa,\n\t\t\t\tpicard_java_heap = gc_bias_picard_java_heap,\n\t\t\t}\n\t\t}\n\n\t\t# special trimming/mapping for xcor (when starting from FASTQs)\n\t\tif ( has_input_of_align ) {\n\t\t\tcall align as align_R1 { input :\n\t\t\t\tfastqs_R1 = fastqs_R1[i],\n\t\t\t\tfastqs_R2 = [],\n\t\t\t\ttrim_bp = xcor_trim_bp,\n\t\t\t\tcroplength = 0,\n\n\t\t\t\taligner = aligner,\n\t\t\t\tmito_chr_name = mito_chrname,\n\t\t\t\tcustom_align_py = custom_align_py,\n\t\t\t\tidx_tar = if aligner=='bwa' then bwa_idxtar\n\t\t\t\t\telse if aligner=='bowtie2' then bowtie2_idxtar\n\t\t\t\t\telse custom_aligner_idxtar,\n\t\t\t\tpaired_end = false,\n\t\t\t\tuse_bwa_mem_for_pe = use_bwa_mem_for_pe,\n\n\t\t\t\tcpu = align_cpu,\n\t\t\t\tmem_mb = align_mem_mb,\n\t\t\t\ttime_hr = align_time_hr,\n\t\t\t\tdisks = align_disks,\n\t\t\t}\n\t\t\t# no bam deduping for xcor\n\t\t\tcall filter as filter_R1 { input :\n\t\t\t\tbam = align_R1.bam,\n\t\t\t\tpaired_end = false,\n\t\t\t\tdup_marker = dup_marker,\n\t\t\t\tmapq_thresh = mapqthresh,\n\t\t\t\tfilter_chrs = filterchrs,\n\t\t\t\tchrsz = chrsz,\n\t\t\t\tno_dup_removal = true,\n\t\t\t\tmito_chr_name = mito_chrname,\n\n\t\t\t\tcpu = filter_cpu,\n\t\t\t\tmem_mb = filter_mem_mb,\n\t\t\t\tpicard_java_heap = filter_picard_java_heap,\n\t\t\t\ttime_hr = filter_time_hr,\n\t\t\t\tdisks = filter_disks,\n\t\t\t}\n\t\t\tcall bam2ta as bam2ta_no_dedup_R1 { input :\n\t\t\t\tbam = filter_R1.nodup_bam, # it's named as nodup bam but it's not deduped but just filtered\n\t\t\t\tpaired_end = false,\n\t\t\t\tsubsample = 0,\n\t\t\t\tmito_chr_name = mito_chrname,\n\n\t\t\t\tcpu = bam2ta_cpu,\n\t\t\t\tmem_mb = bam2ta_mem_mb,\n\t\t\t\ttime_hr = bam2ta_time_hr,\n\t\t\t\tdisks = bam2ta_disks,\n\t\t\t}\n\t\t}\n\n\t\t# special trimming/mapping for xcor (when starting from BAMs)\n\t\tBoolean has_input_of_bam2ta_no_dedup = (has_output_of_align || defined(align.bam))\n\t\t\t&& !defined(bam2ta_no_dedup_R1.ta)\n\t\tif ( has_input_of_bam2ta_no_dedup ) {\n\t\t\tcall filter as filter_nodedup { input :\n\t\t\t\tbam = bam,\n\t\t\t\tpaired_end = pairedend,\n\t\t\t\tdup_marker = dup_marker,\n\t\t\t\tmapq_thresh = mapqthresh,\n\t\t\t\tfilter_chrs = filterchrs,\n\t\t\t\tchrsz = chrsz,\n\t\t\t\tno_dup_removal = true,\n\t\t\t\tmito_chr_name = mito_chrname,\n\n\t\t\t\tcpu = filter_cpu,\n\t\t\t\tmem_mb = filter_mem_mb,\n\t\t\t\tpicard_java_heap = filter_picard_java_heap,\n\t\t\t\ttime_hr = filter_time_hr,\n\t\t\t\tdisks = filter_disks,\n\t\t\t}\n\t\t\tcall bam2ta as bam2ta_no_dedup { input :\n\t\t\t\tbam = filter_no_dedup.nodup_bam, # output name is nodup but it's not deduped\n\t\t\t\tpaired_end = pairedend,\n\t\t\t\tsubsample = 0,\n\t\t\t\tmito_chr_name = mito_chrname,\n\n\t\t\t\tcpu = bam2ta_cpu,\n\t\t\t\tmem_mb = bam2ta_mem_mb,\n\t\t\t\ttime_hr = bam2ta_time_hr,\n\t\t\t\tdisks = bam2ta_disks,\n\t\t\t}\n\t\t}\n\n\t\t# use trimmed/unfilitered R1 tagAlign for paired end dataset \t\t\n\t\t# if not starting from fastqs, keep using old method\n\t\t# (mapping with both ends for tag-aligns to be used for xcor)\n\t\t# subsample tagalign (non-mito) and cross-correlation analysis\n\t\tFile? ta_xcor = if defined(bam2ta_no_dedup_R1.ta) then bam2ta_no_dedup_R1.ta\n\t\t\telse if defined(bam2ta_no_dedup.ta) then bam2ta_nodedup.ta\n\t\t\telse ta\n\t\tBoolean? paired_end_xcor = if defined(bam2ta_no_dedup_R1.ta) then false\n\t\t\telse pairedend\n\n\t\tBoolean has_input_of_xcor = defined(ta_xcor)\n\t\tif ( has_input_of_xcor ) {\n\t\t\tcall xcor { input :\n\t\t\t\tta = ta_xcor,\n\t\t\t\tpaired_end = paired_end_xcor,\n\t\t\t\tsubsample = xcor_subsample_reads,\n\t\t\t\tmito_chr_name = mito_chrname,\n\t\t\t\tchip_seq_type = pipeline_type,\n\t\t\t\texclusion_range_min = xcor_exclusion_range_min,\n\t\t\t\texclusion_range_max = xcor_exclusion_range_max,\n\t\t\t\tcpu = xcor_cpu,\n\t\t\t\tmem_mb = xcor_mem_mb,\n\t\t\t\ttime_hr = xcor_time_hr,\n\t\t\t\tdisks = xcordisks,\n\t\t\t}\n\t\t}\n\n\t\t# before peak calling, get fragment length from xcor analysis or given input\n\t\t# if fraglen [] is defined in the input JSON, fraglen from xcor will be ignored\n\t\tInt? fraglen = if i<length(fraglen) && defined(fraglen[i]) then fraglen[i]\n\t\t\telse xcor.fraglen\n\t}\n\n\t# align each control\n\tscatter(i in range(num_ctl)) {\n\t\t# to override endedness definition for individual control\n\t\t# \tctl_paired_end will override ctl_paired_ends[i]\n\t\tBoolean? ctl_pairedend = if !defined(ctl_paired_end) && i<length(ctl_paired_ends) then ctl_paired_ends[i]\n\t\t\telse if defined(ctl_paired_end) then ctl_paired_end\n\t\t\telse paired_end\n\n\t\tBoolean has_input_of_align_ctl = i<length(ctl_fastqs_R1) && length(ctl_fastqs_R1[i])>0\n\t\tBoolean has_output_of_align_ctl = i<length(ctl_bams) && defined(ctl_bams[i])\n\t\tif ( has_input_of_align_ctl && !has_output_of_align_ctl ) {\n\t\t\tcall align as align_ctl { input :\n\t\t\t\tfastqs_R1 = ctl_fastqs_R1[i],\n\t\t\t\tfastqs_R2 = ctl_fastqs_R2[i],\n\t\t\t\tcrop_length = croplength,\n\n\t\t\t\taligner = aligner,\n\t\t\t\tmito_chr_name = mito_chrname,\n\t\t\t\tcustom_align_py = custom_align_py,\n\t\t\t\tidx_tar = if aligner=='bwa' then bwa_idxtar\n\t\t\t\t\telse if aligner=='bowtie2' then bowtie2_idxtar\n\t\t\t\t\telse custom_aligner_idxtar,\n\t\t\t\tpaired_end = ctl_pairedend,\n\t\t\t\tuse_bwa_mem_for_pe = use_bwa_mem_for_pe,\n\n\t\t\t\ttrimmomatic_java_heap = align_trimmomatic_java_heap,\n\t\t\t\tcpu = align_cpu,\n\t\t\t\tmem_mb = align_mem_mb,\n\t\t\t\ttime_hr = align_time_hr,\n\t\t\t\tdisks = align_disks,\n\t\t\t}\n\t\t}\n\t\tFile? ctlbam = if has_output_of_align_ctl then ctl_bams[i] else align_ctl.bam\n\n\t\tBoolean has_input_of_filter_ctl = has_output_of_align_ctl || defined(align_ctl.bam)\n\t\tBoolean has_output_of_filter_ctl = i<length(ctl_nodup_bams) && defined(ctl_nodup_bams[i])\n\t\t# skip if we already have output of this step\n\t\tif ( has_input_of_filter_ctl && !has_output_of_filter_ctl ) {\n\t\t\tcall filter as filter_ctl { input :\n\t\t\t\tbam = ctlbam,\n\t\t\t\tpaired_end = ctl_pairedend,\n\t\t\t\tdup_marker = dup_marker,\n\t\t\t\tmapq_thresh = mapqthresh,\n\t\t\t\tfilter_chrs = filterchrs,\n\t\t\t\tchrsz = chrsz,\n\t\t\t\tno_dup_removal = no_dup_removal,\n\t\t\t\tmito_chr_name = mito_chrname,\n\n\t\t\t\tcpu = filter_cpu,\n\t\t\t\tmem_mb = filter_mem_mb,\n\t\t\t\tpicard_java_heap = filter_picard_java_heap,\n\t\t\t\ttime_hr = filter_time_hr,\n\t\t\t\tdisks = filter_disks,\n\t\t\t}\n\t\t}\n\t\tFile? ctl_nodupbam = if has_output_of_filter_ctl then ctl_nodup_bams[i] else filter_ctl.nodup_bam\n\n\t\tBoolean has_input_of_bam2ta_ctl = has_output_of_filter_ctl || defined(filter_ctl.nodup_bam)\n\t\tBoolean has_output_of_bam2ta_ctl = i<length(ctl_tas) && defined(ctl_tas[i])\n\t\tif ( has_input_of_bam2ta_ctl && !has_output_of_bam2ta_ctl ) {\n\t\t\tcall bam2ta as bam2ta_ctl { input :\n\t\t\t\tbam = ctl_nodupbam,\n\t\t\t\tsubsample = subsample_reads,\n\t\t\t\tpaired_end = ctl_pairedend,\n\t\t\t\tmito_chr_name = mito_chrname,\n\n\t\t\t\tcpu = bam2ta_cpu,\n\t\t\t\tmem_mb = bam2ta_mem_mb,\n\t\t\t\ttime_hr = bam2ta_time_hr,\n\t\t\t\tdisks = bam2ta_disks,\n\t\t\t}\n\t\t}\n\t\tFile? ctlta = if has_output_of_bam2ta_ctl then ctl_tas[i] else bam2ta_ctl.ta\n\t}\n\n\t# if there are TAs for ALL replicates then pool them\n\tBoolean has_all_inputs_of_pool_ta = length(selectall(ta))==num_rep\n\tif ( has_all_inputs_of_pool_ta && num_rep>1 ) {\n\t\t# pool tagaligns from true replicates\n\t\tcall poolta { input :\n\t\t\ttas = ta,\n\t\t\tprefix = 'rep',\n\t\t}\n\t}\n\n\t# if there are pr1 TAs for ALL replicates then pool them\n\tBoolean has_all_inputs_of_pool_ta_pr1 = length(select_all(spr.ta_pr1))==num_rep\n\tif ( has_all_inputs_of_pool_ta_pr1 && num_rep>1 && !align_only && !true_rep_only ) {\n\t\t# pool tagaligns from pseudo replicate 1\n\t\tcall pool_ta as pool_ta_pr1 { input :\n\t\t\ttas = spr.ta_pr1,\n\t\t\tprefix = 'rep-pr1',\n\t\t}\n\t}\n\n\t# if there are pr2 TAs for ALL replicates then pool them\n\tBoolean has_all_inputs_of_pool_ta_pr2 = length(select_all(spr.ta_pr2))==num_rep\n\tif ( has_all_inputs_of_pool_ta_pr1 && num_rep>1 && !align_only && !true_rep_only ) {\n\t\t# pool tagaligns from pseudo replicate 2\n\t\tcall pool_ta as pool_ta_pr2 { input :\n\t\t\ttas = spr.ta_pr2,\n\t\t\tprefix = 'rep-pr2',\n\t\t}\n\t}\n\n\t# if there are CTL TAs for ALL replicates then pool them\n\tBoolean has_all_inputs_of_pool_ta_ctl = length(select_all(ctlta))==num_ctl\n\tif ( has_all_inputs_of_pool_ta_ctl && num_ctl>1 ) {\n\t\t# pool tagaligns from true replicates\n\t\tcall pool_ta as pool_ta_ctl { input :\n\t\t\ttas = ctlta,\n\t\t\tprefix = 'ctl',\n\t\t}\n\t}\n\n\tBoolean has_input_of_count_signal_track_pooled = defined(pool_ta.ta_pooled)\n\tif ( has_input_of_count_signal_track_pooled && enable_count_signal_track && num_rep>1 ) {\n\t\tcall count_signal_track as count_signal_track_pooled { input :\n\t\t\tta = pool_ta.tapooled,\n\t\t\tchrsz = chrsz,\n\t\t}\n\t}\n\n\tBoolean has_input_ofjsd = defined(blacklist) && length(select_all(nodupbam))==num_rep\n\tif ( has_input_of_jsd && num_rep > 0 && enable_jsd ) {\n\t\t# fingerprint and JS-distance plot\n\t\tcall jsd { input :\n\t\t\tnodup_bams = nodupbam,\n\t\t\tctl_bams = ctl_nodupbam, # use first control only\n\t\t\tblacklist = blacklist_,\n\t\t\tmapq_thresh = mapqthresh,\n\n\t\t\tcpu = jsd_cpu,\n\t\t\tmem_mb = jsd_mem_mb,\n\t\t\ttime_hr = jsd_time_hr,\n\t\t\tdisks = jsd_disks,\n\t\t}\n\t}\n\n\tBoolean has_all_input_of_choose_ctl = length(selectall(ta))==num_rep\n\t\t&& length(select_all(ctlta))==num_ctl && num_ctl > 0\n\tif ( has_all_input_of_choose_ctl && !align_only ) {\n\t\t# choose appropriate control for each exp IP replicate\n\t\t# outputs:\n\t\t# \tchoose_ctl.idx : control replicate index for each exp replicate \n\t\t#\t\t\t\t\t-1 means pooled ctl replicate\n\t\tcall choosectl { input:\n\t\t\ttas = ta,\n\t\t\tctl_tas = ctlta,\n\t\t\tta_pooled = pool_ta.ta_pooled,\n\t\t\tctl_ta_pooled = pool_ta_ctl.ta_pooled,\n\t\t\talways_use_pooled_ctl = always_use_pooled_ctl,\n\t\t\tctl_depth_ratio = ctl_depth_ratio,\n\t\t}\n\t}\n\n\t# make control ta array [[1,2,3,4]] -> [[1],[2],[3],[4]], will be zipped with exp ta array latter\n\tArray[Array[File]] chosen_ctl_tas =\n\t\tif has_all_input_of_choose_ctl then transpose(select_all([choose_ctl.chosen_ctl_tas]))\n\t\telse [[],[],[],[],[],[],[],[],[],[]]\n\n\t# workaround for dx error (Unsupported combination: womType: Int womValue: ([225], Array[Int]))\n\tArray[Int] fraglen_tmp = selectall(fraglen)\n\n\t# we have all tas and ctl_tas (optional for histone chipseq) ready, let's call peaks\n\tscatter(i in range(num_rep)) {\n\t\tBoolean has_input_of_callpeak = defined(ta[i])\n\t\tBoolean has_output_of_call_peak = i<length(peaks) && defined(peaks[i])\n\t\tif ( has_input_of_call_peak && !has_output_of_call_peak && !align_only ) {\n\t\t\tcall call_peak { input :\n\t\t\t\tpeak_caller = peakcaller,\n\t\t\t\tpeak_type = peaktype,\n\t\t\t\tcustom_call_peak_py = custom_call_peakpy,\n\t\t\t\ttas = flatten([[ta[i]], chosen_ctltas[i]]),\n\t\t\t\tgensz = gensz,\n\t\t\t\tchrsz = chrsz_,\n\t\t\t\tcap_num_peak = cap_numpeak,\n\t\t\t\tpval_thresh = pval_thresh,\n\t\t\t\tfdr_thresh = fdr_thresh,\n\t\t\t\tfraglen = fraglentmp[i],\n\t\t\t\tblacklist = blacklist,\n\t\t\t\tregex_bfilt_peak_chr_name = regex_bfilt_peak_chrname,\n\n\t\t\t\tcpu = call_peak_cpu,\n\t\t\t\tmem_mb = call_peak_mem_mb,\n\t\t\t\tdisks = call_peak_disks,\n\t\t\t\ttime_hr = call_peak_timehr,\n\t\t\t}\n\t\t}\n\t\tFile? peak = if has_output_of_call_peak then peaks[i]\n\t\t\telse call_peak.peak\n\n\t\t# signal track\n\t\tif ( has_input_of_call_peak && !align_only ) {\n\t\t\tcall macs2_signaltrack { input :\n\t\t\t\ttas = flatten([[ta[i]], chosen_ctltas[i]]),\n\t\t\t\tgensz = gensz,\n\t\t\t\tchrsz = chrsz_,\n\t\t\t\tpval_thresh = pval_thresh,\n\t\t\t\tfraglen = fraglen_tmp[i],\n\n\t\t\t\tmem_mb = macs2_signal_track_mem_mb,\n\t\t\t\tdisks = macs2_signal_track_disks,\n\t\t\t\ttime_hr = macs2_signal_track_time_hr,\n\t\t\t}\n\t\t}\n\n\t\t# call peaks on 1st pseudo replicated tagalign\n\t\tBoolean has_input_of_call_peak_pr1 = defined(spr.ta_pr1[i])\n\t\tBoolean has_output_of_call_peak_pr1 = i<length(peaks_pr1) && defined(peaks_pr1[i])\n\t\tif ( has_input_of_call_peak_pr1 && !has_output_of_call_peak_pr1 && !true_rep_only ) {\n\t\t\tcall call_peak as call_peak_pr1 { input :\n\t\t\t\tpeak_caller = peakcaller,\n\t\t\t\tpeak_type = peaktype,\n\t\t\t\tcustom_call_peak_py = custom_call_peak_py,\n\t\t\t\ttas = flatten([[spr.ta_pr1[i]], chosen_ctltas[i]]),\n\t\t\t\tgensz = gensz,\n\t\t\t\tchrsz = chrsz_,\n\t\t\t\tcap_num_peak = cap_numpeak,\n\t\t\t\tpval_thresh = pval_thresh,\n\t\t\t\tfdr_thresh = fdr_thresh,\n\t\t\t\tfraglen = fraglentmp[i],\n\t\t\t\tblacklist = blacklist,\n\t\t\t\tregex_bfilt_peak_chr_name = regex_bfilt_peak_chrname,\n\t\n\t\t\t\tcpu = call_peak_cpu,\n\t\t\t\tmem_mb = call_peak_mem_mb,\n\t\t\t\tdisks = call_peak_disks,\n\t\t\t\ttime_hr = call_peak_time_hr,\n\t\t\t}\n\t\t}\n\t\tFile? peakpr1 = if has_output_of_call_peak_pr1 then peaks_pr1[i]\n\t\t\telse call_peak_pr1.peak\n\n\t\t# call peaks on 2nd pseudo replicated tagalign\n\t\tBoolean has_input_of_call_peak_pr2 = defined(spr.ta_pr2[i])\n\t\tBoolean has_output_of_call_peak_pr2 = i<length(peaks_pr2) && defined(peaks_pr2[i])\n\t\tif ( has_input_of_call_peak_pr2 && !has_output_of_call_peak_pr2 && !true_rep_only ) {\n\t\t\tcall call_peak as call_peak_pr2 { input :\n\t\t\t\tpeak_caller = peakcaller,\n\t\t\t\tpeak_type = peaktype,\n\t\t\t\tcustom_call_peak_py = custom_call_peak_py,\n\t\t\t\ttas = flatten([[spr.ta_pr2[i]], chosen_ctltas[i]]),\n\t\t\t\tgensz = gensz,\n\t\t\t\tchrsz = chrsz_,\n\t\t\t\tcap_num_peak = cap_numpeak,\n\t\t\t\tpval_thresh = pval_thresh,\n\t\t\t\tfdr_thresh = fdr_thresh,\n\t\t\t\tfraglen = fraglentmp[i],\n\t\t\t\tblacklist = blacklist,\n\t\t\t\tregex_bfilt_peak_chr_name = regex_bfilt_peak_chrname,\n\n\t\t\t\tcpu = call_peak_cpu,\n\t\t\t\tmem_mb = call_peak_mem_mb,\n\t\t\t\tdisks = call_peak_disks,\n\t\t\t\ttime_hr = call_peak_time_hr,\n\t\t\t}\n\t\t}\n\t\tFile? peakpr2 = if has_output_of_call_peak_pr2 then peaks_pr2[i]\n\t\t\telse call_peak_pr2.peak\n\t}\n\n\t# if ( !align_only && num_rep > 1 ) {\n\t# rounded mean of fragment length, which will be used for \n\t# 1) calling peaks for pooled true/pseudo replicates\n\t# 2) calculating FRiP\n\tcall rounded_mean as fraglen_mean { input :\n\t\tints = fraglen_tmp,\n\t}\n\t# }\n\n\t# actually not an array\n\tArray[File?] chosen_ctl_ta_pooled = if !has_all_input_of_choose_ctl then []\n\t\telse if num_ctl < 2 then [ctlta[0]] # choose first (only) control\n\t\telse select_all([pool_ta_ctl.ta_pooled]) # choose pooled control\n\n\tBoolean has_input_of_call_peak_pooled = defined(pool_ta.ta_pooled)\n\tBoolean has_output_of_call_peak_pooled = defined(peak_pooled)\n\tif ( has_input_of_call_peak_pooled && !has_output_of_call_peak_pooled && !align_only && num_rep>1 ) {\n\t\t# call peaks on pooled replicate\n\t\t# always call peaks for pooled replicate to get signal tracks\n\t\tcall call_peak as call_peak_pooled { input :\n\t\t\tpeak_caller = peakcaller,\n\t\t\tpeak_type = peaktype,\n\t\t\tcustom_call_peak_py = custom_call_peak_py,\n\t\t\ttas = flatten([select_all([pool_ta.ta_pooled]), chosen_ctl_tapooled]),\n\t\t\tgensz = gensz,\n\t\t\tchrsz = chrsz_,\n\t\t\tcap_num_peak = cap_numpeak,\n\t\t\tpval_thresh = pval_thresh,\n\t\t\tfdr_thresh = fdr_thresh,\n\t\t\tfraglen = fraglen_mean.roundedmean,\n\t\t\tblacklist = blacklist,\n\t\t\tregex_bfilt_peak_chr_name = regex_bfilt_peak_chrname,\n\n\t\t\tcpu = call_peak_cpu,\n\t\t\tmem_mb = call_peak_mem_mb,\n\t\t\tdisks = call_peak_disks,\n\t\t\ttime_hr = call_peak_time_hr,\n\t\t}\n\t}\n\tFile? peakpooled = if has_output_of_call_peak_pooled then peak_pooled\n\t\telse call_peak_pooled.peak\t\n\n\t# macs2 signal track for pooled rep\n\tif ( has_input_of_call_peak_pooled && !align_only && num_rep>1 ) {\n\t\tcall macs2_signal_track as macs2_signal_track_pooled { input :\n\t\t\ttas = flatten([select_all([pool_ta.ta_pooled]), chosen_ctl_tapooled]),\n\t\t\tgensz = gensz,\n\t\t\tchrsz = chrsz_,\n\t\t\tpval_thresh = pval_thresh,\n\t\t\tfraglen = fraglen_mean.rounded_mean,\n\n\t\t\tmem_mb = macs2_signal_track_mem_mb,\n\t\t\tdisks = macs2_signal_track_disks,\n\t\t\ttime_hr = macs2_signal_track_time_hr,\n\t\t}\n\t}\n\n\tBoolean has_input_of_call_peak_ppr1 = defined(pool_ta_pr1.ta_pooled)\n\tBoolean has_output_of_call_peak_ppr1 = defined(peak_ppr1)\n\tif ( has_input_of_call_peak_ppr1 && !has_output_of_call_peak_ppr1 && !align_only && !true_rep_only && num_rep>1 ) {\n\t\t# call peaks on 1st pooled pseudo replicates\n\t\tcall call_peak as call_peak_ppr1 { input :\n\t\t\tpeak_caller = peakcaller,\n\t\t\tpeak_type = peaktype,\n\t\t\tcustom_call_peak_py = custom_call_peak_py,\n\t\t\ttas = flatten([select_all([pool_ta_pr1.ta_pooled]), chosen_ctl_tapooled]),\n\t\t\tgensz = gensz,\n\t\t\tchrsz = chrsz_,\n\t\t\tcap_num_peak = cap_numpeak,\n\t\t\tpval_thresh = pval_thresh,\n\t\t\tfdr_thresh = fdr_thresh,\n\t\t\tfraglen = fraglen_mean.roundedmean,\n\t\t\tblacklist = blacklist,\n\t\t\tregex_bfilt_peak_chr_name = regex_bfilt_peak_chrname,\n\n\t\t\tcpu = call_peak_cpu,\n\t\t\tmem_mb = call_peak_mem_mb,\n\t\t\tdisks = call_peak_disks,\n\t\t\ttime_hr = call_peak_time_hr,\n\t\t}\n\t}\n\tFile? peakppr1 = if has_output_of_call_peak_ppr1 then peak_ppr1\n\t\telse call_peak_ppr1.peak\n\n\tBoolean has_input_of_call_peak_ppr2 = defined(pool_ta_pr2.ta_pooled)\n\tBoolean has_output_of_call_peak_ppr2 = defined(peak_ppr2)\n\tif ( has_input_of_call_peak_ppr2 && !has_output_of_call_peak_ppr2 && !align_only && !true_rep_only && num_rep>1 ) {\n\t\t# call peaks on 2nd pooled pseudo replicates\n\t\tcall call_peak as call_peak_ppr2 { input :\n\t\t\tpeak_caller = peakcaller,\n\t\t\tpeak_type = peaktype,\n\t\t\tcustom_call_peak_py = custom_call_peak_py,\n\t\t\ttas = flatten([select_all([pool_ta_pr2.ta_pooled]), chosen_ctl_tapooled]),\n\t\t\tgensz = gensz,\n\t\t\tchrsz = chrsz_,\n\t\t\tcap_num_peak = cap_numpeak,\n\t\t\tpval_thresh = pval_thresh,\n\t\t\tfdr_thresh = fdr_thresh,\n\t\t\tfraglen = fraglen_mean.roundedmean,\n\t\t\tblacklist = blacklist,\n\t\t\tregex_bfilt_peak_chr_name = regex_bfilt_peak_chrname,\n\n\t\t\tcpu = call_peak_cpu,\n\t\t\tmem_mb = call_peak_mem_mb,\n\t\t\tdisks = call_peak_disks,\n\t\t\ttime_hr = call_peak_time_hr,\n\t\t}\n\t}\n\tFile? peakppr2 = if has_output_of_call_peak_ppr2 then peak_ppr2\n\t\telse call_peak_ppr2.peak\n\n\t# do IDR/overlap on all pairs of two replicates (i,j)\n\t# \twhere i and j are zero-based indices and 0 <= i < j < numrep\n\tArray[Pair[Int, Int]] pairs = cross(range(num_rep),range(numrep))\n\tscatter( pair in pairs ) {\n\t\tPair[Int, Int]? null_pair\n\t\tPair[Int, Int]? pairs = if pair.left<pair.right then pair else null_pair\n\t}\n\tArray[Pair[Int, Int]] pairs = select_all(pairs)\n\n\tif ( !align_only ) {\n\t\tscatter( pair in pairs ) {\n\t\t\t# pair.left = 0-based index of 1st replicate\n\t\t\t# pair.right = 0-based index of 2nd replicate\n\t\t\t# Naive overlap on every pair of true replicates\n\t\t\tcall overlap { input :\n\t\t\t\tprefix = 'rep'+(pair.left+1)+'_vsrep'+(pair.right+1),\n\t\t\t\tpeak1 = peak[pair.left],\n\t\t\t\tpeak2 = peak_[pair.right],\n\t\t\t\tpeak_pooled = peakpooled,\n\t\t\t\tfraglen = fraglen_mean.rounded_mean,\n\t\t\t\tpeak_type = peaktype,\n\t\t\t\tblacklist = blacklist,\n\t\t\t\tchrsz = chrsz,\n\t\t\t\tregex_bfilt_peak_chr_name = regex_bfilt_peak_chrname,\n\t\t\t\tta = pool_ta.ta_pooled,\n\t\t\t}\n\t\t}\n\t}\n\n\tif ( enable_idr && !align_only ) {\n\t\tscatter( pair in pairs ) {\n\t\t\t# pair.left = 0-based index of 1st replicate\n\t\t\t# pair.right = 0-based index of 2nd replicate\n\t\t\t# IDR on every pair of true replicates\n\t\t\tcall idr { input :\n\t\t\t\tprefix = 'rep'+(pair.left+1)+'_vsrep'+(pair.right+1),\n\t\t\t\tpeak1 = peak[pair.left],\n\t\t\t\tpeak2 = peak_[pair.right],\n\t\t\t\tpeak_pooled = peakpooled,\n\t\t\t\tfraglen = fraglen_mean.rounded_mean,\n\t\t\t\tidr_thresh = idr_thresh,\n\t\t\t\tpeak_type = peaktype,\n\t\t\t\trank = idrrank,\n\t\t\t\tblacklist = blacklist,\n\t\t\t\tchrsz = chrsz,\n\t\t\t\tregex_bfilt_peak_chr_name = regex_bfilt_peak_chrname,\n\t\t\t\tta = pool_ta.ta_pooled,\n\t\t\t}\n\t\t}\n\t}\n\n\t# overlap on pseudo-replicates (pr1, pr2) for each true replicate\n\tif ( !align_only && !true_rep_only ) {\n\t\tscatter( i in range(num_rep) ) {\n\t\t\tcall overlap as overlap_pr { input :\n\t\t\t\tprefix = 'rep'+(i+1)+'-pr1_vs_rep'+(i+1)+'-pr2',\n\t\t\t\tpeak1 = peakpr1[i],\n\t\t\t\tpeak2 = peakpr2[i],\n\t\t\t\tpeakpooled = peak[i],\n\t\t\t\tfraglen = fraglen_[i],\n\t\t\t\tpeak_type = peaktype,\n\t\t\t\tblacklist = blacklist,\n\t\t\t\tchrsz = chrsz,\n\t\t\t\tregex_bfilt_peak_chr_name = regex_bfilt_peak_chrname,\n\t\t\t\tta = ta_[i],\n\t\t\t}\n\t\t}\n\t}\n\n\tif ( !align_only && !true_rep_only && enable_idr ) {\n\t\tscatter( i in range(num_rep) ) {\n\t\t\t# IDR on pseduo replicates\n\t\t\tcall idr as idr_pr { input :\n\t\t\t\tprefix = 'rep'+(i+1)+'-pr1_vs_rep'+(i+1)+'-pr2',\n\t\t\t\tpeak1 = peakpr1[i],\n\t\t\t\tpeak2 = peakpr2[i],\n\t\t\t\tpeakpooled = peak[i],\n\t\t\t\tfraglen = fraglen_[i],\n\t\t\t\tidr_thresh = idr_thresh,\n\t\t\t\tpeak_type = peaktype,\n\t\t\t\trank = idrrank,\n\t\t\t\tblacklist = blacklist,\n\t\t\t\tchrsz = chrsz,\n\t\t\t\tregex_bfilt_peak_chr_name = regex_bfilt_peak_chrname,\n\t\t\t\tta = ta_[i],\n\t\t\t}\n\t\t}\n\t}\n\n\tif ( !align_only && !true_rep_only && num_rep > 1 ) {\n\t\t# Naive overlap on pooled pseudo replicates\n\t\tcall overlap as overlap_ppr { input :\n\t\t\tprefix = 'pooled-pr1_vs_pooled-pr2',\n\t\t\tpeak1 = peakppr1,\n\t\t\tpeak2 = peakppr2,\n\t\t\tpeak_pooled = peakpooled,\n\t\t\tpeak_type = peaktype,\n\t\t\tfraglen = fraglen_mean.roundedmean,\n\t\t\tblacklist = blacklist,\n\t\t\tchrsz = chrsz_,\n\t\t\tregex_bfilt_peak_chr_name = regex_bfilt_peak_chrname,\n\t\t\tta = pool_ta.ta_pooled,\n\t\t}\n\t}\n\n\tif ( !align_only && !true_rep_only && num_rep > 1 ) {\n\t\t# IDR on pooled pseduo replicates\n\t\tcall idr as idr_ppr { input :\n\t\t\tprefix = 'pooled-pr1_vs_pooled-pr2',\n\t\t\tpeak1 = peakppr1,\n\t\t\tpeak2 = peakppr2,\n\t\t\tpeak_pooled = peakpooled,\n\t\t\tidr_thresh = idr_thresh,\n\t\t\tpeak_type = peaktype,\n\t\t\tfraglen = fraglen_mean.rounded_mean,\n\t\t\trank = idrrank,\n\t\t\tblacklist = blacklist,\n\t\t\tchrsz = chrsz,\n\t\t\tregex_bfilt_peak_chr_name = regex_bfilt_peak_chrname,\n\t\t\tta = pool_ta.ta_pooled,\n\t\t}\n\t}\n\n\t# reproducibility QC for overlap/IDR peaks\n\tif ( !align_only && !true_rep_only && num_rep > 0 ) {\n\t\t# reproducibility QC for overlapping peaks\n\t\tcall reproducibility as reproducibility_overlap { input :\n\t\t\tprefix = 'overlap',\n\t\t\tpeaks = overlap.bfilt_overlap_peak,\n\t\t\tpeaks_pr = overlap_pr.bfilt_overlap_peak,\n\t\t\tpeak_ppr = overlap_ppr.bfilt_overlap_peak,\n\t\t\tpeak_type = peaktype,\n\t\t\tchrsz = chrsz_,\n\t\t}\n\t}\n\n\tif ( !align_only && !true_rep_only && num_rep > 0 && enable_idr ) {\n\t\t# reproducibility QC for IDR peaks\n\t\tcall reproducibility as reproducibility_idr { input :\n\t\t\tprefix = 'idr',\n\t\t\tpeaks = idr.bfilt_idr_peak,\n\t\t\tpeaks_pr = idr_pr.bfilt_idr_peak,\n\t\t\tpeak_ppr = idr_ppr.bfilt_idr_peak,\n\t\t\tpeak_type = peaktype,\n\t\t\tchrsz = chrsz_,\n\t\t}\n\t}\n\n\t# Generate final QC report and JSON\n\tcall qc_report { input :\n\t\tpipeline_ver = pipeline_ver,\n\t\ttitle = title,\n\t\tdescription = description,\n\t\tgenome = genomename,\n\t\tpaired_ends = pairedend,\n\t\tctl_paired_ends = ctl_pairedend,\n\t\tpipeline_type = pipelinetype,\n\t\taligner = aligner,\n\t\tpeak_caller = peakcaller,\n\t\tcap_num_peak = cap_numpeak,\n\t\tidr_thresh = idr_thresh,\n\t\tpval_thresh = pval_thresh,\n\t\txcor_trim_bp = xcor_trim_bp,\n\t\txcor_subsample_reads = xcor_subsample_reads,\n\n\t\tsamstat_qcs = align.samstat_qc,\n\t\tnodup_samstat_qcs = filter.samstat_qc,\n\t\tdup_qcs = filter.dup_qc,\n\t\tlib_complexity_qcs = filter.lib_complexity_qc,\n\t\txcor_plots = xcor.plot_png,\n\t\txcor_scores = xcor.score,\n\n\t\tctl_samstat_qcs = align_ctl.samstat_qc,\n\t\tctl_nodup_samstat_qcs = filter_ctl.samstat_qc,\n\t\tctl_dup_qcs = filter_ctl.dup_qc,\n\t\tctl_lib_complexity_qcs = filter_ctl.lib_complexity_qc,\n\n\t\tjsd_plot = jsd.plot,\n\t\tjsd_qcs = jsd.jsd_qcs,\n\n\t\tfrip_qcs = call_peak.frip_qc,\n\t\tfrip_qcs_pr1 = call_peak_pr1.frip_qc,\n\t\tfrip_qcs_pr2 = call_peak_pr2.frip_qc,\n\t\tfrip_qc_pooled = call_peak_pooled.frip_qc,\n\t\tfrip_qc_ppr1 = call_peak_ppr1.frip_qc,\n\t\tfrip_qc_ppr2 = call_peak_ppr2.frip_qc,\n\n\t\tidr_plots = idr.idr_plot,\n\t\tidr_plots_pr = idr_pr.idr_plot,\n\t\tidr_plot_ppr = idr_ppr.idr_plot,\n\t\tfrip_idr_qcs = idr.frip_qc,\n\t\tfrip_idr_qcs_pr = idr_pr.frip_qc,\n\t\tfrip_idr_qc_ppr = idr_ppr.frip_qc,\n\t\tfrip_overlap_qcs = overlap.frip_qc,\n\t\tfrip_overlap_qcs_pr = overlap_pr.frip_qc,\n\t\tfrip_overlap_qc_ppr = overlap_ppr.frip_qc,\n\t\tidr_reproducibility_qc = reproducibility_idr.reproducibility_qc,\n\t\toverlap_reproducibility_qc = reproducibility_overlap.reproducibility_qc,\n\n\t\tgc_plots = gc_bias.gc_plot,\n\n\t\tpeak_region_size_qcs = call_peak.peak_region_size_qc,\n\t\tpeak_region_size_plots = call_peak.peak_region_size_plot,\n\t\tnum_peak_qcs = call_peak.num_peak_qc,\n\n\t\tidr_opt_peak_region_size_qc = reproducibility_idr.peak_region_size_qc,\n\t\tidr_opt_peak_region_size_plot = reproducibility_overlap.peak_region_size_plot,\n\t\tidr_opt_num_peak_qc = reproducibility_idr.num_peak_qc,\n\n\t\toverlap_opt_peak_region_size_qc = reproducibility_overlap.peak_region_size_qc,\n\t\toverlap_opt_peak_region_size_plot = reproducibility_overlap.peak_region_size_plot,\n\t\toverlap_opt_num_peak_qc = reproducibility_overlap.num_peak_qc,\n\t}\n\n\toutput {\n\t\tFile report = qc_report.report\n\t\tFile qc_json = qc_report.qc_json\n\t\tBoolean qc_json_ref_match = qc_report.qc_json_ref_match\n\t}\n}\n\ntask align {\n\tArray[File] fastqs_R1 \t\t# [merge_id]\n\tArray[File] fastqs_R2\n\tInt? trim_bp\t\t\t# this is for R1 only\n\tInt crop_length\n\tString aligner\n\tString mito_chr_name\n\tInt? multimapping\n\tFile? custom_align_py\t\n\tFile? idx_tar\t\t\t# reference index tar\n\tBoolean paired_end\n\tBoolean use_bwa_mem_for_pe\n\n\tString? trimmomatic_java_heap\n\tInt cpu\n\tInt mem_mb\n\tInt time_hr\n\tString disks\n\n\tArray[Array[File]] tmp_fastqs = if paired_end then transpose([fastqs_R1, fastqs_R2])\n\t\t\t\telse transpose([fastqs_R1])\n\tcommand {\n\t\tset -e\n\n\t\t# check if pipeline dependencies can be found\n\t\tif [[ -z \"$(which encode_task_merge_fastq.py 2> /dev/null || true)\" ]]\n\t\tthen\n\t\t echo -e \"\n Error: pipeline dependencies not found.\" 1>&2\n\t\t echo 'Conda users: Did you activate Conda environment (conda activate encode-chip-seq-pipeline)?' 1>&2\n\t\t echo ' Or did you install Conda and environment correctly (bash scripts/install_conda_env.sh)?' 1>&2\n\t\t echo 'GCP/AWS/Docker users: Did you add --docker flag to Caper command line arg?' 1>&2\n\t\t echo 'Singularity users: Did you add --singularity flag to Caper command line arg?' 1>&2\n\t\t echo -e \"\n\" 1>&2\n\t\t exit 3\n\t\tfi\n\t\tpython3 $(which encode_task_merge_fastq.py) \\n\t\t\t${write_tsv(tmp_fastqs)} \\n\t\t\t${if paired_end then '--paired-end' else ''} \\n\t\t\t${'--nth ' + cpu}\n\n\t\tif [ -z '${trim_bp}' ]; then\n\t\t\tSUFFIX=\n\t\telse\n\t\t\tSUFFIX=_trimmed\n\t\t\tpython3 $(which encode_task_trim_fastq.py) \\n\t\t\t\tR1/.fastq.gz \\n\t\t\t\t--trim-bp ${trim_bp} \\n\t\t\t\t--out-dir R1$SUFFIX\n\t\t\tif [ '${paired_end}' == 'true' ]; then\n\t\t\t\tpython3 $(which encode_task_trim_fastq.py) \\n\t\t\t\t\tR2/.fastq.gz \\n\t\t\t\t\t--trim-bp ${trim_bp} \\n\t\t\t\t\t--out-dir R2$SUFFIX\n\t\t\tfi\n\t\tfi\n\t\tif [ '${crop_length}' == '0' ]; then\n\t\t\tSUFFIX=$SUFFIX\n\t\telse\n\t\t\tNEW_SUFFIX=\"$SUFFIX\"_cropped\n\t\t\tpython3 $(which encode_task_trimmomatic.py) \\n\t\t\t\t--fastq1 R1$SUFFIX/.fastq.gz \\n\t\t\t\t${if paired_end then '--fastq2 R2$SUFFIX/.fastq.gz' else ''} \\n\t\t\t\t${if paired_end then '--paired-end' else ''} \\n\t\t\t\t--crop-length ${crop_length} \\n\t\t\t\t--out-dir-R1 R1$NEW_SUFFIX \\n\t\t\t\t${if paired_end then '--out-dir-R2 R2$NEW_SUFFIX' else ''} \\n\t\t\t\t${'--trimmomatic-java-heap ' + if defined(trimmomatic_java_heap) then trimmomatic_java_heap else (mem_mb + 'M')} \\n\t\t\t\t${'--nth ' + cpu}\n\t\t\tSUFFIX=$NEW_SUFFIX\n\t\tfi\n\n\t\tif [ '${aligner}' == 'bwa' ]; then\n\t\t \tpython3 $(which encode_task_bwa.py) \\n\t\t\t\t${idx_tar} \\n\t\t\t\tR1$SUFFIX/.fastq.gz \\n\t\t\t\t${if paired_end then 'R2$SUFFIX/.fastq.gz' else ''} \\n\t\t\t\t${if paired_end then '--paired-end' else ''} \\n\t\t\t\t${if use_bwa_mem_for_pe then '--use-bwa-mem-for-pe' else ''} \\n\t\t\t\t${'--nth ' + cpu}\n\n\t\telif [ '${aligner}' == 'bowtie2' ]; then\n\t\t \tpython3 $(which encode_task_bowtie2.py) \\n\t\t\t\t${idx_tar} \\n\t\t\t\tR1$SUFFIX/.fastq.gz \\n\t\t\t\t${if paired_end then 'R2$SUFFIX/.fastq.gz' else ''} \\n\t\t\t\t${'--multimapping ' + multimapping} \\n\t\t\t\t${if paired_end then '--paired-end' else ''} \\n\t\t\t\t${'--nth ' + cpu}\n\t\telse\n\t\t\tpython3 ${custom_align_py} \\n\t\t\t\t${idx_tar} \\n\t\t\t\tR1$SUFFIX/.fastq.gz \\n\t\t\t\t${if paired_end then 'R2$SUFFIX/.fastq.gz' else ''} \\n\t\t\t\t${if paired_end then '--paired-end' else ''} \\n\t\t\t\t${'--nth ' + cpu}\n\t\tfi \n\n\t\tpython3 $(which encode_task_post_align.py) \\n\t\t\tR1$SUFFIX/.fastq.gz $(ls .bam) \\n\t\t\t${'--mito-chr-name ' + mito_chr_name} \\n\t\t\t${'--nth ' + cpu}\n\t\trm -rf R1 R2 R1$SUFFIX R2$SUFFIX\n\t}\n\toutput {\n\t\tFile bam = glob('.bam')[0]\n\t\tFile bai = glob('.bai')[0]\n\t\tFile samstat_qc = glob('.samstats.qc')[0]\n\t\tFile read_len_log = glob('.read_length.txt')[0]\n\t}\n\truntime {\n\t\tcpu : cpu\n\t\tmemory : '${mem_mb} MB'\n\t\ttime : time_hr\n\t\tdisks : disks\n\t\tpreemptible: 0\n\t}\n}\n\ntask filter {\n\tFile bam\n\tBoolean paired_end\n\tString dup_marker \t\t\t# picard.jar MarkDuplicates (picard) or \n\t\t\t\t\t\t\t\t# sambamba markdup (sambamba)\n\tInt mapq_thresh\t\t\t\t# threshold for low MAPQ reads removal\n\tArray[String] filter_chrs \t# chrs to be removed from final (nodup/filt) BAM\n\tFile chrsz\t\t\t\t\t# 2-col chromosome sizes file\n\tBoolean no_dup_removal \t\t# no dupe reads removal when filtering BAM\n\tString mito_chr_name\n\n\tInt cpu\n\tInt mem_mb\n\tString? picard_java_heap\n\tInt time_hr\n\tString disks\n\n\tcommand {\n\t\tpython3 $(which encode_task_filter.py) \\n\t\t\t${bam} \\n\t\t\t${if paired_end then '--paired-end' else ''} \\n\t\t\t--multimapping 0 \\n\t\t\t${'--dup-marker ' + dup_marker} \\n\t\t\t${'--mapq-thresh ' + mapq_thresh} \\n\t\t\t--filter-chrs ${sep=' ' filter_chrs} \\n\t\t\t${'--chrsz ' + chrsz} \\n\t\t\t${if no_dup_removal then '--no-dup-removal' else ''} \\n\t\t\t${'--mito-chr-name ' + mito_chr_name} \\n\t\t\t${'--nth ' + cpu} \\n\t\t\t${'--picard-java-heap ' + if defined(picard_java_heap) then picard_java_heap else (mem_mb + 'M')}\n\t}\n\toutput {\n\t\tFile nodup_bam = glob('.bam')[0]\n\t\tFile nodup_bai = glob('.bai')[0]\n\t\tFile samstat_qc = glob('.samstats.qc')[0]\n\t\tFile dup_qc = glob('.dup.qc')[0]\n\t\tFile lib_complexity_qc = glob('.lib_complexity.qc')[0]\n\t}\n\truntime {\n\t\tcpu : cpu\n\t\tmemory : '${mem_mb} MB'\n\t\ttime : time_hr\n\t\tdisks : disks\n\t}\n}\n\ntask bam2ta {\n\tFile bam\n\tBoolean paired_end\n\tString mito_chr_name \t\t# mito chromosome name\n\tInt subsample \t\t\t\t# number of reads to subsample TAGALIGN\n\t\t\t\t\t\t\t\t# this affects all downstream analysis\n\tInt cpu\n\tInt mem_mb\n\tInt time_hr\n\tString disks\n\n\tcommand {\n\t\tpython3 $(which encode_task_bam2ta.py) \\n\t\t\t${bam} \\n\t\t\t--disable-tn5-shift \\n\t\t\t${if paired_end then '--paired-end' else ''} \\n\t\t\t${'--mito-chr-name ' + mito_chr_name} \\n\t\t\t${'--subsample ' + subsample} \\n\t\t\t${'--nth ' + cpu}\n\t}\n\toutput {\n\t\tFile ta = glob('.tagAlign.gz')[0]\n\t}\n\truntime {\n\t\tcpu : cpu\n\t\tmemory : '${mem_mb} MB'\n\t\ttime : time_hr\n\t\tdisks : disks\n\t}\n}\n\ntask spr { # make two self pseudo replicates\n\tFile ta\n\tBoolean paired_end\n\n\tInt mem_mb\n\n\tcommand {\n\t\tpython3 $(which encode_task_spr.py) \\n\t\t\t${ta} \\n\t\t\t${if paired_end then '--paired-end' else ''}\n\t}\n\toutput {\n\t\tFile ta_pr1 = glob('.pr1.tagAlign.gz')[0]\n\t\tFile ta_pr2 = glob('.pr2.tagAlign.gz')[0]\n\t}\n\truntime {\n\t\tcpu : 1\n\t\tmemory : '${mem_mb} MB'\n\t\ttime : 1\n\t\tdisks : 'local-disk 50 HDD'\n\t}\n}\n\ntask pool_ta {\n\tArray[File?] tas\n\tInt? col \t\t\t# number of columns in pooled TA\n\tString? prefix \t\t# basename prefix\n\n\tcommand {\n\t\tpython3 $(which encode_task_pool_ta.py) \\n\t\t\t${sep=' ' tas} \\n\t\t\t${'--prefix ' + prefix} \\n\t\t\t${'--col ' + col}\n\t}\n\toutput {\n\t\tFile ta_pooled = glob('.tagAlign.gz')[0]\n\t}\n\truntime {\n\t\tcpu : 1\n\t\tmemory : '4000 MB'\n\t\ttime : 1\n\t\tdisks : 'local-disk 50 HDD'\n\t}\n}\n\ntask xcor {\n\tFile ta\n\tBoolean paired_end\n\tString mito_chr_name\n\tInt subsample # number of reads to subsample TAGALIGN\n\t\t\t\t\t# this will be used for xcor only\n\t\t\t\t\t# will not affect any downstream analysis\n\tString? chip_seq_type\n\tInt? exclusion_range_min\n\tInt? exclusion_range_max\n\n\tInt cpu\n\tInt mem_mb\t\n\tInt time_hr\n\tString disks\n\n\tcommand {\n\t\tpython3 $(which encode_task_xcor.py) \\n\t\t\t${ta} \\n\t\t\t${if paired_end then '--paired-end' else ''} \\n\t\t\t${'--mito-chr-name ' + mito_chr_name} \\n\t\t\t${'--subsample ' + subsample} \\n\t\t\t${'--chip-seq-type ' + chip_seq_type} \\n\t\t\t${'--exclusion-range-min ' + exclusion_range_min} \\n\t\t\t${'--exclusion-range-max ' + exclusion_range_max} \\n\t\t\t${'--subsample ' + subsample} \\n\t\t\t${'--nth ' + cpu}\n\t}\n\toutput {\n\t\tFile plot_pdf = glob('.cc.plot.pdf')[0]\n\t\tFile plot_png = glob('.cc.plot.png')[0]\n\t\tFile score = glob('.cc.qc')[0]\n\t\tInt fraglen = read_int(glob('.cc.fraglen.txt')[0])\n\t}\n\truntime {\n\t\tcpu : cpu\n\t\tmemory : '${mem_mb} MB'\n\t\ttime : time_hr\n\t\tdisks : disks\n\t}\n}\n\ntask jsd {\n\tArray[File?] nodup_bams\n\tArray[File?] ctl_bams\n\tFile blacklist\n\tInt mapq_thresh\n\n\tInt cpu\n\tInt mem_mb\n\tInt time_hr\n\tString disks\n\n\tcommand {\n\t\tpython3 $(which encode_task_jsd.py) \\n\t\t\t${sep=' ' nodup_bams} \\n\t\t\t${if length(ctl_bams)>0 then '--ctl-bam '+ select_first(ctl_bams) else ''} \\n\t\t\t${'--mapq-thresh '+ mapq_thresh} \\n\t\t\t${'--blacklist '+ blacklist} \\n\t\t\t${'--nth ' + cpu}\n\t}\n\toutput {\n\t\tFile plot = glob('.png')[0]\n\t\tArray[File] jsd_qcs = glob('.jsd.qc')\n\t}\n\truntime {\n\t\tcpu : cpu\n\t\tmemory : '${mem_mb} MB'\n\t\ttime : time_hr\n\t\tdisks : disks\n\t}\n}\n\ntask choose_ctl {\n\tArray[File?] tas\n\tArray[File?] ctl_tas\n\tFile? ta_pooled\n\tFile? ctl_ta_pooled\n\tBoolean always_use_pooled_ctl # always use pooled control for all exp rep.\n\tFloat ctl_depth_ratio \t\t# if ratio between controls is higher than this\n\t\t\t\t\t\t\t\t# then always use pooled control for all exp rep.\n\tcommand {\n\t\tpython3 $(which encode_task_choose_ctl.py) \\n\t\t\t--tas ${sep=' ' tas} \\n\t\t\t--ctl-tas ${sep=' ' ctl_tas} \\n\t\t\t${'--ta-pooled ' + ta_pooled} \\n\t\t\t${'--ctl-ta-pooled ' + ctl_ta_pooled} \\n\t\t\t${if always_use_pooled_ctl then '--always-use-pooled-ctl' else ''} \\n\t\t\t${'--ctl-depth-ratio ' + ctl_depth_ratio}\n\t}\n\toutput {\n\t\tArray[File] chosen_ctl_tas = glob('ctl_for_rep.tagAlign.gz')\n\t}\n\truntime {\n\t\tcpu : 1\n\t\tmemory : '8000 MB'\n\t\ttime : 1\n\t\tdisks : 'local-disk 50 HDD'\n\t}\t\n}\n\ntask count_signal_track {\n\tFile ta \t\t\t# tag-align\n\tFile chrsz\t\t\t# 2-col chromosome sizes file\n\n\tcommand {\n\t\tpython3 $(which encode_task_count_signal_track.py) \\n\t\t\t${ta} \\n\t\t\t${'--chrsz ' + chrsz}\n\t}\n\toutput {\n\t\tFile pos_bw = glob('.positive.bigwig')[0]\n\t\tFile neg_bw = glob('.negative.bigwig')[0]\n\t}\n\truntime {\n\t\tcpu : 1\n\t\tmemory : '8000 MB'\n\t\ttime : 4\n\t\tdisks : 'local-disk 50 HDD'\n\t}\n}\n\ntask call_peak {\n\tString peak_caller\n\tString peak_type\n\tFile? custom_call_peak_py\n\n\tArray[File?] tas\t# [ta, control_ta]. control_ta is optional\n\tInt fraglen \t\t# fragment length from xcor\n\tString gensz\t\t# Genome size (sum of entries in 2nd column of \n # chr. sizes file, or hs for human, ms for mouse)\n\tFile chrsz\t\t\t# 2-col chromosome sizes file\n\tInt cap_num_peak\t# cap number of raw peaks called from MACS2\n\tFloat pval_thresh \t# p.value threshold for MACS2\n\tFloat? fdr_thresh \t# FDR threshold for SPP\n\n\tFile? blacklist \t# blacklist BED to filter raw peaks\n\tString? regex_bfilt_peak_chr_name\n\n\tInt cpu\t\n\tInt mem_mb\n\tInt time_hr\n\tString disks\n\n\tcommand {\n\t\tset -e\n\n\t\tif [ '${peak_caller}' == 'macs2' ]; then\n\t\t\tpython3 $(which encode_task_macs2_chip.py) \\n\t\t\t\t${sep=' ' tas} \\n\t\t\t\t${'--gensz '+ gensz} \\n\t\t\t\t${'--chrsz ' + chrsz} \\n\t\t\t\t${'--fraglen ' + fraglen} \\n\t\t\t\t${'--cap-num-peak ' + cap_num_peak} \\n\t\t\t\t${'--pval-thresh '+ pval_thresh}\n\n\t\telif [ '${peak_caller}' == 'spp' ]; then\n\t\t\tpython3 $(which encode_task_spp.py) \\n\t\t\t\t${sep=' ' tas} \\n\t\t\t\t${'--fraglen ' + fraglen} \\n\t\t\t\t${'--cap-num-peak ' + cap_num_peak} \\n\t\t\t\t${'--fdr-thresh '+ fdr_thresh} \\n\t\t\t\t${'--nth ' + cpu}\n\n\t\telse\n\t\t\tpython3 ${custom_call_peak_py} \\n\t\t\t\t${sep=' ' tas} \\n\t\t\t\t${'--gensz '+ gensz} \\n\t\t\t\t${'--chrsz ' + chrsz} \\n\t\t\t\t${'--fraglen ' + fraglen} \\n\t\t\t\t${'--cap-num-peak ' + cap_num_peak} \\n\t\t\t\t${'--pval-thresh '+ pval_thresh}\n\t\t\t\t${'--fdr-thresh '+ fdr_thresh}\n\t\t\t\t${'--nth ' + cpu}\n\t\tfi\n\n\t\tpython3 $(which encode_task_post_call_peak_chip.py) \\n\t\t\t$(ls Peak.gz) \\n\t\t\t${'--ta ' + tas[0]} \\n\t\t\t${'--regex-bfilt-peak-chr-name \'' + regex_bfilt_peak_chr_name + '\''} \\n\t\t\t${'--chrsz ' + chrsz} \\n\t\t\t${'--fraglen ' + fraglen} \\n\t\t\t${'--peak-type ' + peak_type} \\n\t\t\t${'--blacklist ' + blacklist}\t\t\n\t}\n\toutput {\n\t\t# generated by custom_call_peak_py\n\t\tFile peak = glob('[!.][!b][!f][!i][!l][!t].'+peak_type+'.gz')[0]\n\t\t# generated by post_call_peak py\n\t\tFile bfilt_peak = glob('.bfilt.'+peak_type+'.gz')[0]\n\t\tFile bfilt_peak_bb = glob('.bfilt.'+peak_type+'.bb')[0]\n\t\tFile bfilt_peak_hammock = glob('.bfilt.'+peak_type+'.hammock.gz')[0]\n\t\tFile bfilt_peak_hammock_tbi = glob('.bfilt.'+peak_type+'.hammock.gz')[1]\n\t\tFile frip_qc = glob('.frip.qc')[0]\n\t\tFile peak_region_size_qc = glob('.peak_region_size.qc')[0]\n\t\tFile peak_region_size_plot = glob('.peak_region_size.png')[0]\n\t\tFile num_peak_qc = glob('.num_peak.qc')[0]\n\t}\n\truntime {\n\t\tcpu : if peak_caller == 'macs2' then 1 else cpu\n\t\tmemory : '${mem_mb} MB'\n\t\ttime : time_hr\n\t\tdisks : disks\n\t\tpreemptible: 0\t\t\n\t}\n}\n\ntask macs2_signal_track {\n\tArray[File?] tas\t# [ta, control_ta]. control_ta is optional\n\tInt fraglen \t\t# fragment length from xcor\n\tString gensz\t\t# Genome size (sum of entries in 2nd column of \n # chr. sizes file, or hs for human, ms for mouse)\n\tFile chrsz\t\t\t# 2-col chromosome sizes file\n\tFloat pval_thresh \t# p.value threshold\n\n\tInt mem_mb\n\tInt time_hr\n\tString disks\n\n\tcommand {\n\t\tpython3 $(which encode_task_macs2_signal_track_chip.py) \\n\t\t\t${sep=' ' tas} \\n\t\t\t${'--gensz '+ gensz} \\n\t\t\t${'--chrsz ' + chrsz} \\n\t\t\t${'--fraglen ' + fraglen} \\n\t\t\t${'--pval-thresh '+ pval_thresh}\n\t}\n\toutput {\n\t\tFile pval_bw = glob('.pval.signal.bigwig')[0]\n\t\tFile fc_bw = glob('.fc.signal.bigwig')[0]\n\t}\n\truntime {\n\t\tcpu : 1\n\t\tmemory : '${mem_mb} MB'\n\t\ttime : time_hr\n\t\tdisks : disks\n\t\tpreemptible: 0\n\t}\n}\n\ntask idr {\n\tString prefix \t\t# prefix for IDR output file\n\tFile peak1 \t\t\t\n\tFile peak2\n\tFile peak_pooled\n\tFloat idr_thresh\n\tFile? blacklist \t# blacklist BED to filter raw peaks\n\tString regex_bfilt_peak_chr_name\n\t# parameters to compute FRiP\n\tFile? ta\t\t\t# to calculate FRiP\n\tInt fraglen \t\t# fragment length from xcor\n\tFile chrsz\t\t\t# 2-col chromosome sizes file\n\tString peak_type\n\tString rank\n\n\tcommand {\n\t\t${if defined(ta) then '' else 'touch null.frip.qc'}\n\t\ttouch null \n\t\tpython3 $(which encode_task_idr.py) \\n\t\t\t${peak1} ${peak2} ${peak_pooled} \\n\t\t\t${'--prefix ' + prefix} \\n\t\t\t${'--idr-thresh ' + idr_thresh} \\n\t\t\t${'--peak-type ' + peak_type} \\n\t\t\t--idr-rank ${rank} \\n\t\t\t${'--fraglen ' + fraglen} \\n\t\t\t${'--chrsz ' + chrsz} \\n\t\t\t${'--blacklist '+ blacklist} \\n\t\t\t${'--regex-bfilt-peak-chr-name \'' + regex_bfilt_peak_chr_name + '\''} \\n\t\t\t${'--ta ' + ta}\n\t}\n\toutput {\n\t\tFile idr_peak = glob('[!.][!b][!f][!i][!l][!t].'+peak_type+'.gz')[0]\n\t\tFile bfilt_idr_peak = glob('.bfilt.'+peak_type+'.gz')[0]\n\t\tFile bfilt_idr_peak_bb = glob('.bfilt.'+peak_type+'.bb')[0]\n\t\tFile bfilt_idr_peak_hammock = glob('.bfilt.'+peak_type+'.hammock.gz')[0]\n\t\tFile bfilt_idr_peak_hammock_tbi = glob('.bfilt.'+peak_type+'.hammock.gz')[1]\n\t\tFile idr_plot = glob('.txt.png')[0]\n\t\tFile idr_unthresholded_peak = glob('.txt.gz')[0]\n\t\tFile idr_log = glob('.idr.log')[0]\n\t\tFile frip_qc = if defined(ta) then glob('.frip.qc')[0] else glob('null')[0]\n\t}\n\truntime {\n\t\tcpu : 1\n\t\tmemory : '4000 MB'\n\t\ttime : 1\n\t\tdisks : 'local-disk 50 HDD'\n\t}\t\n}\n\ntask overlap {\n\tString prefix \t# prefix for IDR output file\n\tFile peak1\n\tFile peak2\n\tFile peak_pooled\n\tFile? blacklist # blacklist BED to filter raw peaks\n\tString regex_bfilt_peak_chr_name\n\t# parameters to compute FRiP\n\tFile? ta\t\t# to calculate FRiP\n\tInt fraglen \t# fragment length from xcor (for FRIP)\n\tFile chrsz\t\t# 2-col chromosome sizes file\n\tString peak_type\n\n\tcommand {\n\t\t${if defined(ta) then '' else 'touch null.frip.qc'}\n\t\ttouch null \n\t\tpython3 $(which encode_task_overlap.py) \\n\t\t\t${peak1} ${peak2} ${peak_pooled} \\n\t\t\t${'--prefix ' + prefix} \\n\t\t\t${'--peak-type ' + peak_type} \\n\t\t\t${'--fraglen ' + fraglen} \\n\t\t\t${'--chrsz ' + chrsz} \\n\t\t\t${'--blacklist '+ blacklist} \\n\t\t\t--nonamecheck \\n\t\t\t${'--regex-bfilt-peak-chr-name \'' + regex_bfilt_peak_chr_name + '\''} \\n\t\t\t${'--ta ' + ta}\n\t}\n\toutput {\n\t\tFile overlap_peak = glob('[!.][!b][!f][!i][!l][!t].'+peak_type+'.gz')[0]\n\t\tFile bfilt_overlap_peak = glob('.bfilt.'+peak_type+'.gz')[0]\n\t\tFile bfilt_overlap_peak_bb = glob('.bfilt.'+peak_type+'.bb')[0]\n\t\tFile bfilt_overlap_peak_hammock = glob('.bfilt.'+peak_type+'.hammock.gz')[0]\n\t\tFile bfilt_overlap_peak_hammock_tbi = glob('.bfilt.'+peak_type+'.hammock.gz')[1]\n\t\tFile frip_qc = if defined(ta) then glob('.frip.qc')[0] else glob('null')[0]\n\t}\n\truntime {\n\t\tcpu : 1\n\t\tmemory : '4000 MB'\n\t\ttime : 1\n\t\tdisks : 'local-disk 50 HDD'\n\t}\t\n}\n\ntask reproducibility {\n\tString prefix\n\tArray[File]? peaks # peak files from pair of true replicates\n\t\t\t\t\t\t# in a sorted order. for example of 4 replicates,\n\t\t\t\t\t\t# 1,2 1,3 1,4 2,3 2,4 3,4.\n # x,y means peak file from rep-x vs rep-y\n\tArray[File?] peaks_pr\t# peak files from pseudo replicates\n\tFile? peak_ppr\t\t\t# Peak file from pooled pseudo replicate.\n\tString peak_type\n\tFile chrsz\t\t\t# 2-col chromosome sizes file\n\n\tcommand {\n\t\tpython3 $(which encode_task_reproducibility.py) \\n\t\t\t${sep=' ' peaks} \\n\t\t\t--peaks-pr ${sep=' ' peaks_pr} \\n\t\t\t${'--peak-ppr '+ peak_ppr} \\n\t\t\t--prefix ${prefix} \\n\t\t\t${'--peak-type ' + peak_type} \\n\t\t\t${'--chrsz ' + chrsz}\n\t}\n\toutput {\n\t\tFile optimal_peak = glob('optimal_peak..gz')[0]\n\t\tFile optimal_peak_bb = glob('optimal_peak..bb')[0]\n\t\tFile optimal_peak_hammock = glob('optimal_peak..hammock.gz')[0]\n\t\tFile optimal_peak_hammock_tbi = glob('optimal_peak..hammock.gz')[1]\n\t\tFile conservative_peak = glob('conservative_peak..gz')[0]\n\t\tFile conservative_peak_bb = glob('conservative_peak..bb')[0]\n\t\tFile conservative_peak_hammock = glob('conservative_peak..hammock.gz')[0]\n\t\tFile conservative_peak_hammock_tbi = glob('conservative_peak..hammock.gz')[1]\n\t\tFile reproducibility_qc = glob('reproducibility.qc')[0]\n\t\t# QC metrics for optimal peak\n\t\tFile peak_region_size_qc = glob('.peak_region_size.qc')[0]\n\t\tFile peak_region_size_plot = glob('.peak_region_size.png')[0]\n\t\tFile num_peak_qc = glob('.num_peak.qc')[0]\n\t}\n\truntime {\n\t\tcpu : 1\n\t\tmemory : '4000 MB'\n\t\ttime : 1\n\t\tdisks : 'local-disk 50 HDD'\n\t}\t\n}\n\ntask gc_bias {\n\tFile nodup_bam\n\tFile ref_fa\n\n\tString? picard_java_heap\n\n\tcommand {\n\t\tpython3 $(which encode_task_gc_bias.py) \\n\t\t\t${'--nodup-bam ' + nodup_bam} \\n\t\t\t${'--ref-fa ' + ref_fa} \\n\t\t\t${'--picard-java-heap ' + if defined(picard_java_heap) then picard_java_heap else '10G'}\n\t}\n\toutput {\n\t\tFile gc_plot = glob('.gc_plot.png')[0]\n\t\tFile gc_log = glob('.gc.txt')[0]\n\t}\n\truntime {\n\t\tcpu : 1\n\t\tmemory : '10000 MB'\n\t\ttime : 6\n\t\tdisks : 'local-disk 100 HDD'\n\t}\n}\n\n# gather all outputs and generate \n# - qc.html\t\t: organized final HTML report\n# - qc.json\t\t: all QCs\ntask qc_report {\n\t# optional metadata\n\tString pipeline_ver\n \tString title # name of sample\n\tString description # description for sample\n\tString? genome\n\t#String? encode_accession_id\t# ENCODE accession ID of sample\n\t# workflow params\n\tArray[Boolean?] paired_ends\n\tArray[Boolean?] ctl_paired_ends\n\tString pipeline_type\n\tString aligner\n\tString peak_caller\n\tInt cap_num_peak\n\tFloat idr_thresh\n\tFloat pval_thresh\n\tInt xcor_trim_bp\n\tInt xcor_subsample_reads\n\t# QCs\n\tArray[File?] samstat_qcs\n\tArray[File?] nodup_samstat_qcs\n\tArray[File?] dup_qcs\n\tArray[File?] lib_complexity_qcs\n\tArray[File?] ctl_samstat_qcs\n\tArray[File?] ctl_nodup_samstat_qcs\n\tArray[File?] ctl_dup_qcs\n\tArray[File?] ctl_lib_complexity_qcs\n\tArray[File?] xcor_plots\n\tArray[File?] xcor_scores\n\tFile? jsd_plot\n\tArray[File]? jsd_qcs\n\tArray[File]? idr_plots\n\tArray[File]? idr_plots_pr\n\tFile? idr_plot_ppr\n\tArray[File?] frip_qcs\n\tArray[File?] frip_qcs_pr1\n\tArray[File?] frip_qcs_pr2\n\tFile? frip_qc_pooled\n\tFile? frip_qc_ppr1 \n\tFile? frip_qc_ppr2 \n\tArray[File]? frip_idr_qcs\n\tArray[File]? frip_idr_qcs_pr\n\tFile? frip_idr_qc_ppr \n\tArray[File]? frip_overlap_qcs\n\tArray[File]? frip_overlap_qcs_pr\n\tFile? frip_overlap_qc_ppr\n\tFile? idr_reproducibility_qc\n\tFile? overlap_reproducibility_qc\n\n\tArray[File?] gc_plots\n\n\tArray[File?] peak_region_size_qcs\n\tArray[File?] peak_region_size_plots\n\tArray[File?] num_peak_qcs\n\n\tFile? idr_opt_peak_region_size_qc\n\tFile? idr_opt_peak_region_size_plot\n\tFile? idr_opt_num_peak_qc\n\n\tFile? overlap_opt_peak_region_size_qc\n\tFile? overlap_opt_peak_region_size_plot\n\tFile? overlap_opt_num_peak_qc\n\n\tFile? qc_json_ref\n\n\tcommand {\n\t\tpython3 $(which encode_task_qc_report.py) \\n\t\t\t${'--pipeline-ver ' + pipelinever} \\n\t\t\t${\"--title '\" + sub(title,\"'\",\"\") + \"'\"} \\n\t\t\t${\"--desc '\" + sub(description,\"'\",\"_\") + \"'\"} \\n\t\t\t${'--genome ' + genome} \\n\t\t\t${'--multimapping ' + 0} \\n\t\t\t--paired-ends ${sep=' ' paired_ends} \\n\t\t\t--ctl-paired-ends ${sep=' ' ctl_paired_ends} \\n\t\t\t--pipeline-type ${pipeline_type} \\n\t\t\t--aligner ${aligner} \\n\t\t\t--peak-caller ${peak_caller} \\n\t\t\t${'--cap-num-peak ' + cap_num_peak} \\n\t\t\t--idr-thresh ${idr_thresh} \\n\t\t\t--pval-thresh ${pval_thresh} \\n\t\t\t--xcor-trim-bp ${xcor_trim_bp} \\n\t\t\t--xcor-subsample-reads ${xcor_subsamplereads} \\n\t\t\t--samstat-qcs ${sep=':_' samstatqcs} \\n\t\t\t--nodup-samstat-qcs ${sep=':_' nodup_samstatqcs} \\n\t\t\t--dup-qcs ${sep=':_' dupqcs} \\n\t\t\t--lib-complexity-qcs ${sep=':_' lib_complexityqcs} \\n\t\t\t--xcor-plots ${sep=':_' xcorplots} \\n\t\t\t--xcor-scores ${sep=':_' xcorscores} \\n\t\t\t--idr-plots ${sep=':_' idrplots} \\n\t\t\t--idr-plots-pr ${sep=':_' idr_plotspr} \\n\t\t\t--ctl-samstat-qcs ${sep=':_' ctl_samstatqcs} \\n\t\t\t--ctl-nodup-samstat-qcs ${sep=':_' ctl_nodup_samstatqcs} \\n\t\t\t--ctl-dup-qcs ${sep=':_' ctl_dupqcs} \\n\t\t\t--ctl-lib-complexity-qcs ${sep=':_' ctl_lib_complexity_qcs} \\n\t\t\t${'--jsd-plot ' + jsdplot} \\n\t\t\t--jsd-qcs ${sep=':_' jsd_qcs} \\n\t\t\t${'--idr-plot-ppr ' + idr_plotppr} \\n\t\t\t--frip-qcs ${sep=':_' fripqcs} \\n\t\t\t--frip-qcs-pr1 ${sep=':_' frip_qcspr1} \\n\t\t\t--frip-qcs-pr2 ${sep=':_' frip_qcs_pr2} \\n\t\t\t${'--frip-qc-pooled ' + frip_qc_pooled} \\n\t\t\t${'--frip-qc-ppr1 ' + frip_qc_ppr1} \\n\t\t\t${'--frip-qc-ppr2 ' + frip_qcppr2} \\n\t\t\t--frip-idr-qcs ${sep=':_' frip_idrqcs} \\n\t\t\t--frip-idr-qcs-pr ${sep=':_' frip_idr_qcs_pr} \\n\t\t\t${'--frip-idr-qc-ppr ' + frip_idr_qcppr} \\n\t\t\t--frip-overlap-qcs ${sep=':_' frip_overlapqcs} \\n\t\t\t--frip-overlap-qcs-pr ${sep=':_' frip_overlap_qcs_pr} \\n\t\t\t${'--frip-overlap-qc-ppr ' + frip_overlap_qc_ppr} \\n\t\t\t${'--idr-reproducibility-qc ' + idr_reproducibility_qc} \\n\t\t\t${'--overlap-reproducibility-qc ' + overlap_reproducibilityqc} \\n\t\t\t--gc-plots ${sep=':_' gcplots} \\n\t\t\t--peak-region-size-qcs ${sep=':_' peak_region_sizeqcs} \\n\t\t\t--peak-region-size-plots ${sep=':_' peak_region_sizeplots} \\n\t\t\t--num-peak-qcs ${sep=':_' num_peak_qcs} \\n\t\t\t${'--idr-opt-peak-region-size-qc ' + idr_opt_peak_region_size_qc} \\n\t\t\t${'--idr-opt-peak-region-size-plot ' + idr_opt_peak_region_size_plot} \\n\t\t\t${'--idr-opt-num-peak-qc ' + idr_opt_num_peak_qc} \\n\t\t\t${'--overlap-opt-peak-region-size-qc ' + overlap_opt_peak_region_size_qc} \\n\t\t\t${'--overlap-opt-peak-region-size-plot ' + overlap_opt_peak_region_size_plot} \\n\t\t\t${'--overlap-opt-num-peak-qc ' + overlap_opt_num_peak_qc} \\n\t\t\t--out-qc-html qc.html \\n\t\t\t--out-qc-json qc.json \\n\t\t\t${'--qc-json-ref ' + qc_json_ref}\n\t}\n\toutput {\n\t\tFile report = glob('qc.html')[0]\n\t\tFile qc_json = glob('qc.json')[0]\n\t\tBoolean qc_json_ref_match = read_string('qc_json_ref_match.txt')=='True'\n\t}\n\truntime {\n\t\tcpu : 1\n\t\tmemory : '4000 MB'\n\t\ttime : 1\n\t\tdisks : 'local-disk 50 HDD'\n\t}\t\n}\n\n### workflow system tasks\ntask read_genome_tsv {\n\tFile genome_tsv\n\n\tString? null_s\n\tcommand <<<\n\t\t# create empty files for all entries\n\t\ttouch genome_name\n\t\ttouch ref_fa bowtie2_idx_tar bwa_idx_tar chrsz gensz blacklist blacklist2\n\t\ttouch custom_aligner_idx_tar\n\t\ttouch tss tss_enrich # for backward compatibility\n\t\ttouch dnase prom enh reg2map reg2map_bed roadmap_meta\n\t\ttouch mito_chr_name\n\t\ttouch regex_bfilt_peak_chr_name\n\n\t\tpython <<CODE\n\t\timport os\n\t\twith open('${genome_tsv}','r') as fp:\n\t\t\tfor line in fp:\n\t\t\t\tarr = line.strip('\n').split('\t')\n\t\t\t\tif arr:\n\t\t\t\t\tkey, val = arr\n\t\t\t\t\twith open(key,'w') as fp2:\n\t\t\t\t\t\tfp2.write(val)\n\t\tCODE\n\t>>>\n\toutput {\n\t\tString? genome_name = if size('genome_name')==0 then basename(genome_tsv) else read_string('genome_name')\n\t\tString? ref_fa = if size('ref_fa')==0 then null_s else read_string('ref_fa')\n\t\tString? bwa_idx_tar = if size('bwa_idx_tar')==0 then null_s else read_string('bwa_idx_tar')\n\t\tString? bowtie2_idx_tar = if size('bowtie2_idx_tar')==0 then null_s else read_string('bowtie2_idx_tar')\n\t\tString? custom_aligner_idx_tar = if size('custom_aligner_idx_tar')==0 then null_s else read_string('custom_aligner_idx_tar')\n\t\tString? chrsz = if size('chrsz')==0 then null_s else read_string('chrsz')\n\t\tString? gensz = if size('gensz')==0 then null_s else read_string('gensz')\n\t\tString? blacklist = if size('blacklist')==0 then null_s else read_string('blacklist')\n\t\tString? blacklist2 = if size('blacklist2')==0 then null_s else read_string('blacklist2')\n\t\tString? mito_chr_name = if size('mito_chr_name')==0 then null_s else read_string('mito_chr_name')\n\t\tString? regex_bfilt_peak_chr_name = if size('regex_bfilt_peak_chr_name')==0 then 'chr[\\dXY]+'\n\t\t\telse read_string('regex_bfilt_peak_chr_name')\n\t\t# optional data\n\t\tString? tss = if size('tss')!=0 then read_string('tss')\n\t\t\telse if size('tss_enrich')!=0 then read_string('tss_enrich') else null_s\n\t\tString? dnase = if size('dnase')==0 then null_s else read_string('dnase')\n\t\tString? prom = if size('prom')==0 then null_s else read_string('prom')\n\t\tString? enh = if size('enh')==0 then null_s else read_string('enh')\n\t\tString? reg2map = if size('reg2map')==0 then null_s else read_string('reg2map')\n\t\tString? reg2map_bed = if size('reg2map_bed')==0 then null_s else read_string('reg2map_bed')\n\t\tString? roadmap_meta = if size('roadmap_meta')==0 then null_s else read_string('roadmap_meta')\n\t}\n\truntime {\n\t\tmaxRetries : 0\n\t\tcpu : 1\n\t\tmemory : '4000 MB'\n\t\ttime : 1\n\t\tdisks : 'local-disk 50 HDD'\t\t\n\t}\n}\n\ntask roundedmean {\n\tArray[Int] ints\n\tcommand <<<\n\t\tpython <<CODE\n\t\tarr = [${sep=',' ints}]\n\t\twith open('tmp.txt','w') as fp:\n\t\t\tif len(arr):\n\t\t\t sum = sum(arr)\n\t\t\t mean = sum(arr)/float(len(arr))\n\t\t\t fp.write('{}'.format(int(round(mean))))\n\t\t\telse:\n\t\t\t fp.write('0')\n\t\tCODE\n\t>>>\n\toutput {\n\t\tInt rounded_mean = read_int('tmp.txt')\n\t}\n\truntime {\n\t\tcpu : 1\n\t\tmemory : '4000 MB'\n\t\ttime : 1\n\t\tdisks : 'local-disk 50 HDD'\n\t}\t\n}\n\ntask raise_exception {\n\tString msg\n\tcommand {\n\t\techo -e \"\n Error: ${msg}\n\" >&2\n\t\texit 2\n\t}\n\toutput {\n\t\tString error_msg = '${msg}'\n\t}\n\truntime {\n\t\tmaxRetries : 0\n\t}\n}\n", "root": "", "options": "{\n \"backend\": \"slurm\",\n \"default_runtime_attributes\": {\n \"maxRetries\": 1,\n \"slurm_account\": \"aeurban\"\n }\n}", "inputs": "{\n \"chip.title\" : \"chipseq miseq nano\",\n \"chip.description\" : \"chip-seq test miseq nano ipsc\",\n\n \"chip.pipeline_type\" : \"tf\",\n \"chip.aligner\" : \"bowtie2\",\n \"chip.align_only\" : false,\n \"chip.true_rep_only\" : false,\n\n \"chip.genome_tsv\" : \"/reference/ENCODE/pipeline_genome_data/genome_tsv/v1/hg38_scg.tsv\",\n\n \"chip.paired_end\" : true,\n \"chip.ctl_paired_end\" : true,\n\n \"chip.fastqs_rep1_R1\" : [ \"/labs/dflev/siming/chipseq/raw_fastq/test_miseqnano/0425-3-cetch1-flag_subsample1_R1.fastq.gz\" ],\n \"chip.fastqs_rep1_R2\" : [ \"/labs/dflev/siming/chipseq/raw_fastq/test_miseqnano/0425-3-cetch1-flag_subsample1_R2.fastq.gz\" ],\n \"chip.ctl_fastqs_rep1_R1\" : [ \"/labs/dflev/siming/chipseq/raw_fastq/test_miseqnano/0425-3-cetch1-input_subsample1_R1.fastq.gz\" ],\n \"chip.ctl_fastqs_rep1_R2\" : [ \"/labs/dflev/siming/chipseq/raw_fastq/test_miseqnano/0425-3-cetch1-input_subsample1_R2.fastq.gz\" ],\n\n \"chip.crop_length\" : 0,\n\n \"chip.mapq_thresh\" : 30,\n \"chip.dup_marker\" : \"picard\",\n \"chip.no_dup_removal\" : false,\n\n \"chip.subsample_reads\" : 0,\n \"chip.ctl_subsample_reads\" : 0,\n \"chip.xcor_subsample_reads\" : 15000000,\n\n \"chip.xcor_trim_bp\" : 50,\n \"chip.use_filt_pe_ta_for_xcor\" : false,\n\n \"chip.always_use_pooled_ctl\" : false,\n \"chip.ctl_depth_ratio\" : 1.2,\n\n \"chip.peak_caller\" : null,\n \"chip.cap_num_peak_macs2\" : 500000,\n \"chip.pval_thresh\" : 0.01,\n \"chip.fdr_thresh\" : 0.01,\n \"chip.idr_thresh\" : 0.05,\n \"chip.cap_num_peak_spp\" : 300000,\n\n \"chip.enable_jsd\" : true,\n \"chip.enable_gc_bias\" : true,\n \"chip.enable_count_signal_track\" : false,\n\n \"chip.filter_chrs\" : [],\n\n \"chip.align_cpu\" : 4,\n \"chip.align_mem_mb\" : 20000,\n \"chip.align_time_hr\" : 144,\n \"chip.align_disks\" : \"local-disk 400 HDD\",\n\n \"chip.filter_cpu\" : 2,\n \"chip.filter_mem_mb\" : 20000,\n \"chip.filter_time_hr\" : 144,\n \"chip.filter_disks\" : \"local-disk 400 HDD\",\n\n \"chip.bam2ta_cpu\" : 2,\n \"chip.bam2ta_mem_mb\" : 10000,\n \"chip.bam2ta_time_hr\" : 144,\n \"chip.bam2ta_disks\" : \"local-disk 100 HDD\",\n\n \"chip.spr_mem_mb\" : 16000,\n\n \"chip.jsd_cpu\" : 2,\n \"chip.jsd_mem_mb\" : 12000,\n \"chip.jsd_time_hr\" : 144,\n \"chip.jsd_disks\" : \"local-disk 200 HDD\",\n\n \"chip.xcor_cpu\" : 2,\n \"chip.xcor_mem_mb\" : 16000,\n \"chip.xcor_time_hr\" : 144,\n \"chip.xcor_disks\" : \"local-disk 100 HDD\",\n\n \"chip.call_peak_cpu\" : 2,\n \"chip.call_peak_mem_mb\" : 16000,\n \"chip.call_peak_time_hr\" : 144,\n \"chip.call_peak_disks\" : \"local-disk 200 HDD\",\n\n \"chip.macs2_signal_track_mem_mb\" : 16000,\n \"chip.macs2_signal_track_time_hr\" : 144,\n \"chip.macs2_signal_track_disks\" : \"local-disk 400 HDD\",\n\n \"chip.gc_bias_picard_java_heap\" : \"10G\"\n}\n", "workflowUrl": "/labs/dflev/siming/software/chip-seq-pipeline2/chip.wdl", "labels": "{\n \"caper-backend\": \"slurm\",\n \"caper-str-label\": \"chipseq_nano3_test1\",\n \"caper-user\": \"simingz\"\n}" }, "calls": { "chip.overlap_pr": [ { "retryableFailure": true, "executionStatus": "RetryableFailure", "stdout": "/labs/dflev/siming/chipseq/encode_analysis_pipeline/chipseq_nano3_test5/chip/8b1e7e4e-e0a8-4328-9e2d-1b776b0b1b06/call-overlap_pr/shard-0/execution/stdout", "backendStatus": "Done", "commandLine": "\ntouch null \npython3 $(which encode_task_overlap.py) \\n\t/labs/dflev/siming/chipseq/encode_analysis_pipeline/chipseq_nano3_test5/chip/8b1e7e4e-e0a8-4328-9e2d-1b776b0b1b06/call-overlap_pr/shard-0/inputs/1836791510/0425-3-cetch1-flag_subsample1_R1.nodup.pr1_x_ctl_for_rep1.300K.regionPeak.gz /labs/dflev/siming/chipseq/encode_analysis_pipeline/chipseq_nano3_test5/chip/8b1e7e4e-e0a8-4328-9e2d-1b776b0b1b06/call-overlap_pr/shard-0/inputs/-1827717161/0425-3-cetch1-flag_subsample1_R1.nodup.pr2_x_ctl_for_rep1.300K.regionPeak.gz /labs/dflev/siming/chipseq/encode_analysis_pipeline/chipseq_nano3_test5/chip/8b1e7e4e-e0a8-4328-9e2d-1b776b0b1b06/call-overlap_pr/shard-0/inputs/1290460198/0425-3-cetch1-flag_subsample1_R1.nodup_x_ctl_for_rep1.300K.regionPeak.gz \\n\t--prefix rep1-pr1_vs_rep1-pr2 \\n\t--peak-type regionPeak \\n\t--fraglen 200 \\n\t--chrsz /labs/dflev/siming/chipseq/encode_analysis_pipeline/chipseq_nano3_test5/chip/8b1e7e4e-e0a8-4328-9e2d-1b776b0b1b06/call-overlap_pr/shard-0/inputs/-723689832/hg38.chrom.sizes \\n\t--blacklist /labs/dflev/siming/chipseq/encode_analysis_pipeline/chipseq_nano3_test5/chip/8b1e7e4e-e0a8-4328-9e2d-1b776b0b1b06/call-overlap_pr/shard-0/inputs/-723689832/hg38.blacklist.bed.gz \\n\t--nonamecheck \\n\t--regex-bfilt-peak-chr-name 'chr[\dXY]+' \\n\t--ta /labs/dflev/siming/chipseq/encode_analysis_pipeline/chipseq_nano3_test5/chip/8b1e7e4e-e0a8-4328-9e2d-1b776b0b1b06/call-overlap_pr/shard-0/inputs/-1113689129/0425-3-cetch1-flag_subsample1_R1.nodup.tagAlign.gz", "shardIndex": 0, "runtimeAttributes": { "slurm_account": "aeurban", "failOnStderr": "false", "continueOnReturnCode": "0", "maxRetries": "1", "cpu": "1", "time": "1", "memory": "3.90625 GB" }, "callCaching": { "allowResultReuse": false, "hit": false, "result": "Cache Miss", "hashes": { "output count": "1679091C5A880FAF6FB5E6087EB1B2DC", "runtime attribute": { "failOnStderr": "68934A3E9455FA72420237EB05902327", "docker": "N/A", "continueOnReturnCode": "CFCD208495D565EF66E7DFF9F98764DA" }, "output expression": { "File overlap_peak": "5FE0ABD18E429C34B9371C7F78153E57", "File frip_qc": "FFBDC0CF158D56A15AA0F87CB842D5C8", "File bfilt_overlap_peak_hammock": "AAEDB2388B1A8885F60BDF99860308F2", "File bfilt_overlap_peak_bb": "D2388B1E435478D195314A0DE2E9CEEB", "File bfilt_overlap_peak_hammock_tbi": "F7AD80D930D3266E42DF62760BB4CA48", "File bfilt_overlap_peak": "504E92495AEB2A7383BCE85ACF634E4A" }, "input count": "D3D9446802A44259755D38E6D163E820", "backend name": "8C338F9EC0DF1BB3621803F2983039DA", "command template": "B64FD519BA0800AF98DB11528D527648", "input": { "File peak2": "7e16b4ac12c4afe624c8ece83ff994a4", "File blacklist": "4a58b07c400e8641dc3df79ad6fcf12c", "Int fraglen": "3644A684F98EA8FE223C713B77189A77", "File chrsz": "9b996e0d2eabf4f49ef31793c11d2f45", "String regex_bfilt_peak_chr_name": "C742E46BA2D35AFA2B5C0BBCB7D31CE7", "File peak_pooled": "44a9d024f599eaf7003cafd63a1a8574", "File ta": "2db8b57a38e09aa03ad502953b1adc50", "String prefix": "D3B9D7856D560D6435DB69A1320959A6", "String peak_type": "CCC3DB10222D85D8F7B6CE07DD9D8E2F", "File peak1": "b4ada56382dc83c1a00602df92c79ba5" } }, "effectiveCallCachingMode": "ReadAndWriteCache" }, "inputs": { "blacklist": "/reference/ENCODE/pipeline_genome_data/hg38/hg38.blacklist.bed.gz", "peak_type": "regionPeak", "ta": "/labs/dflev/siming/chipseq/encode_analysis_pipeline/chipseq_nano3_test5/chip/8b1e7e4e-e0a8-4328-9e2d-1b776b0b1b06/call-bam2ta/shard-0/execution/glob-199637d3015dccbe277f621a18be9eb4/0425-3-cetch1-flag_subsample1_R1.nodup.tagAlign.gz", "prefix": "rep1-pr1_vs_rep1-pr2", "peak2": "/labs/dflev/siming/chipseq/encode_analysis_pipeline/chipseq_nano3_test5/chip/8b1e7e4e-e0a8-4328-9e2d-1b776b0b1b06/call-call_peak_pr2/shard-0/execution/glob-3c17d82667de5da0b56a7c17d88175d2/0425-3-cetch1-flag_subsample1_R1.nodup.pr2_x_ctl_for_rep1.300K.regionPeak.gz", "fraglen": 200, "chrsz": "/reference/ENCODE/pipeline_genome_data/hg38/hg38.chrom.sizes", "peak1": "/labs/dflev/siming/chipseq/encode_analysis_pipeline/chipseq_nano3_test5/chip/8b1e7e4e-e0a8-4328-9e2d-1b776b0b1b06/call-call_peak_pr1/shard-0/execution/glob-3c17d82667de5da0b56a7c17d88175d2/0425-3-cetch1-flag_subsample1_R1.nodup.pr1_x_ctl_for_rep1.300K.regionPeak.gz", "regex_bfilt_peak_chr_name": "chr[\dXY]+", "peak_pooled": "/labs/dflev/siming/chipseq/encode_analysis_pipeline/chipseq_nano3_test5/chip/8b1e7e4e-e0a8-4328-9e2d-1b776b0b1b06/call-call_peak/shard-0/execution/glob-3c17d82667de5da0b56a7c17d88175d2/0425-3-cetch1-flag_subsample1_R1.nodup_x_ctl_for_rep1.300K.regionPeak.gz" }, "returnCode": 1, "failures": [ { "causedBy": [], "message": "Job chip.overlap_pr:0:1 exited with return code 1 which has not been declared as a valid return code. See 'continueOnReturnCode' runtime attribute for more details." } ], "jobId": "14894969", "backend": "slurm", "end": "2020-04-07T21:30:05.053Z", "stderr": "/labs/dflev/siming/chipseq/encode_analysis_pipeline/chipseq_nano3_test5/chip/8b1e7e4e-e0a8-4328-9e2d-1b776b0b1b06/call-overlap_pr/shard-0/execution/stderr", "callRoot": "/labs/dflev/siming/chipseq/encode_analysis_pipeline/chipseq_nano3_test5/chip/8b1e7e4e-e0a8-4328-9e2d-1b776b0b1b06/call-overlap_pr/shard-0", "attempt": 1, "executionEvents": [ { "startTime": "2020-04-07T21:30:05.052Z", "description": "UpdatingJobStore", "endTime": "2020-04-07T21:30:05.052Z" }, { "startTime": "2020-04-07T21:29:43.448Z", "description": "Pending", "endTime": "2020-04-07T21:29:43.449Z" }, { "startTime": "2020-04-07T21:29:44.874Z", "endTime": "2020-04-07T21:29:44.875Z", "description": "WaitingForValueStore" }, { "startTime": "2020-04-07T21:29:43.449Z", "endTime": "2020-04-07T21:29:44.874Z", "description": "RequestingExecutionToken" }, { "description": "RunningJob", "startTime": "2020-04-07T21:29:44.944Z", "endTime": "2020-04-07T21:30:05.052Z" }, { "endTime": "2020-04-07T21:29:44.944Z", "startTime": "2020-04-07T21:29:44.885Z", "description": "CallCacheReading" }, { "startTime": "2020-04-07T21:29:44.875Z", "description": "PreparingJob", "endTime": "2020-04-07T21:29:44.885Z" } ], "start": "2020-04-07T21:29:43.448Z" }, { "retryableFailure": false, "executionStatus": "Failed", "stdout": "/labs/dflev/siming/chipseq/encode_analysis_pipeline/chipseq_nano3_test5/chip/8b1e7e4e-e0a8-4328-9e2d-1b776b0b1b06/call-overlap_pr/shard-0/attempt-2/execution/stdout", "backendStatus": "Done", "commandLine": "\ntouch null \npython3 $(which encode_task_overlap.py) \\n\t/labs/dflev/siming/chipseq/encode_analysis_pipeline/chipseq_nano3_test5/chip/8b1e7e4e-e0a8-4328-9e2d-1b776b0b1b06/call-overlap_pr/shard-0/attempt-2/inputs/1836791510/0425-3-cetch1-flag_subsample1_R1.nodup.pr1_x_ctl_for_rep1.300K.regionPeak.gz /labs/dflev/siming/chipseq/encode_analysis_pipeline/chipseq_nano3_test5/chip/8b1e7e4e-e0a8-4328-9e2d-1b776b0b1b06/call-overlap_pr/shard-0/attempt-2/inputs/-1827717161/0425-3-cetch1-flag_subsample1_R1.nodup.pr2_x_ctl_for_rep1.300K.regionPeak.gz /labs/dflev/siming/chipseq/encode_analysis_pipeline/chipseq_nano3_test5/chip/8b1e7e4e-e0a8-4328-9e2d-1b776b0b1b06/call-overlap_pr/shard-0/attempt-2/inputs/1290460198/0425-3-cetch1-flag_subsample1_R1.nodup_x_ctl_for_rep1.300K.regionPeak.gz \\n\t--prefix rep1-pr1_vs_rep1-pr2 \\n\t--peak-type regionPeak \\n\t--fraglen 200 \\n\t--chrsz /labs/dflev/siming/chipseq/encode_analysis_pipeline/chipseq_nano3_test5/chip/8b1e7e4e-e0a8-4328-9e2d-1b776b0b1b06/call-overlap_pr/shard-0/attempt-2/inputs/-723689832/hg38.chrom.sizes \\n\t--blacklist /labs/dflev/siming/chipseq/encode_analysis_pipeline/chipseq_nano3_test5/chip/8b1e7e4e-e0a8-4328-9e2d-1b776b0b1b06/call-overlap_pr/shard-0/attempt-2/inputs/-723689832/hg38.blacklist.bed.gz \\n\t--nonamecheck \\n\t--regex-bfilt-peak-chr-name 'chr[\dXY]+' \\n\t--ta /labs/dflev/siming/chipseq/encode_analysis_pipeline/chipseq_nano3_test5/chip/8b1e7e4e-e0a8-4328-9e2d-1b776b0b1b06/call-overlap_pr/shard-0/attempt-2/inputs/-1113689129/0425-3-cetch1-flag_subsample1_R1.nodup.tagAlign.gz", "shardIndex": 0, "runtimeAttributes": { "slurm_account": "aeurban", "failOnStderr": "false", "continueOnReturnCode": "0", "maxRetries": "1", "cpu": "1", "time": "1", "memory": "3.90625 GB" }, "callCaching": { "allowResultReuse": false, "effectiveCallCachingMode": "WriteCache", "hashes": { "output count": "1679091C5A880FAF6FB5E6087EB1B2DC", "runtime attribute": { "docker": "N/A", "failOnStderr": "68934A3E9455FA72420237EB05902327", "continueOnReturnCode": "CFCD208495D565EF66E7DFF9F98764DA" }, "output expression": { "File overlap_peak": "5FE0ABD18E429C34B9371C7F78153E57", "File frip_qc": "FFBDC0CF158D56A15AA0F87CB842D5C8", "File bfilt_overlap_peak_hammock": "AAEDB2388B1A8885F60BDF99860308F2", "File bfilt_overlap_peak_bb": "D2388B1E435478D195314A0DE2E9CEEB", "File bfilt_overlap_peak_hammock_tbi": "F7AD80D930D3266E42DF62760BB4CA48", "File bfilt_overlap_peak": "504E92495AEB2A7383BCE85ACF634E4A" }, "input count": "D3D9446802A44259755D38E6D163E820", "backend name": "8C338F9EC0DF1BB3621803F2983039DA", "command template": "B64FD519BA0800AF98DB11528D527648", "input": { "File peak2": "7e16b4ac12c4afe624c8ece83ff994a4", "File blacklist": "4a58b07c400e8641dc3df79ad6fcf12c", "Int fraglen": "3644A684F98EA8FE223C713B77189A77", "File chrsz": "9b996e0d2eabf4f49ef31793c11d2f45", "String regex_bfilt_peak_chr_name": "C742E46BA2D35AFA2B5C0BBCB7D31CE7", "File peak_pooled": "44a9d024f599eaf7003cafd63a1a8574", "File ta": "2db8b57a38e09aa03ad502953b1adc50", "String prefix": "D3B9D7856D560D6435DB69A1320959A6", "String peak_type": "CCC3DB10222D85D8F7B6CE07DD9D8E2F", "File peak1": "b4ada56382dc83c1a00602df92c79ba5" } } }, "inputs": { "blacklist": "/reference/ENCODE/pipeline_genome_data/hg38/hg38.blacklist.bed.gz", "peak_type": "regionPeak", "ta": "/labs/dflev/siming/chipseq/encode_analysis_pipeline/chipseq_nano3_test5/chip/8b1e7e4e-e0a8-4328-9e2d-1b776b0b1b06/call-bam2ta/shard-0/execution/glob-199637d3015dccbe277f621a18be9eb4/0425-3-cetch1-flag_subsample1_R1.nodup.tagAlign.gz", "prefix": "rep1-pr1_vs_rep1-pr2", "peak2": "/labs/dflev/siming/chipseq/encode_analysis_pipeline/chipseq_nano3_test5/chip/8b1e7e4e-e0a8-4328-9e2d-1b776b0b1b06/call-call_peak_pr2/shard-0/execution/glob-3c17d82667de5da0b56a7c17d88175d2/0425-3-cetch1-flag_subsample1_R1.nodup.pr2_x_ctl_for_rep1.300K.regionPeak.gz", "fraglen": 200, "chrsz": "/reference/ENCODE/pipeline_genome_data/hg38/hg38.chrom.sizes", "peak1": "/labs/dflev/siming/chipseq/encode_analysis_pipeline/chipseq_nano3_test5/chip/8b1e7e4e-e0a8-4328-9e2d-1b776b0b1b06/call-call_peak_pr1/shard-0/execution/glob-3c17d82667de5da0b56a7c17d88175d2/0425-3-cetch1-flag_subsample1_R1.nodup.pr1_x_ctl_for_rep1.300K.regionPeak.gz", "regex_bfilt_peak_chr_name": "chr[\dXY]+", "peak_pooled": "/labs/dflev/siming/chipseq/encode_analysis_pipeline/chipseq_nano3_test5/chip/8b1e7e4e-e0a8-4328-9e2d-1b776b0b1b06/call-call_peak/shard-0/execution/glob-3c17d82667de5da0b56a7c17d88175d2/0425-3-cetch1-flag_subsample1_R1.nodup_x_ctl_for_rep1.300K.regionPeak.gz" }, "returnCode": 1, "failures": [ { "causedBy": [], "message": "Job chip.overlap_pr:0:2 exited with return code 1 which has not been declared as a valid return code. See 'continueOnReturnCode' runtime attribute for more details." } ], "jobId": "14894971", "backend": "slurm", "end": "2020-04-07T21:30:37.271Z", "stderr": "/labs/dflev/siming/chipseq/encode_analysis_pipeline/chipseq_nano3_test5/chip/8b1e7e4e-e0a8-4328-9e2d-1b776b0b1b06/call-overlap_pr/shard-0/attempt-2/execution/stderr", "callRoot": "/labs/dflev/siming/chipseq/encode_analysis_pipeline/chipseq_nano3_test5/chip/8b1e7e4e-e0a8-4328-9e2d-1b776b0b1b06/call-overlap_pr/shard-0/attempt-2", "attempt": 2, "executionEvents": [ { "startTime": "2020-04-07T21:30:06.893Z", "description": "RunningJob", "endTime": "2020-04-07T21:30:37.271Z" }, { "startTime": "2020-04-07T21:30:06.864Z", "description": "WaitingForValueStore", "endTime": "2020-04-07T21:30:06.876Z" }, { "startTime": "2020-04-07T21:30:37.271Z", "description": "UpdatingJobStore", "endTime": "2020-04-07T21:30:37.271Z" }, { "description": "RequestingExecutionToken", "endTime": "2020-04-07T21:30:06.864Z", "startTime": "2020-04-07T21:30:05.758Z" }, { "startTime": "2020-04-07T21:30:05.758Z", "endTime": "2020-04-07T21:30:05.758Z", "description": "Pending" }, { "startTime": "2020-04-07T21:30:06.876Z", "description": "PreparingJob", "endTime": "2020-04-07T21:30:06.893Z" } ], "start": "2020-04-07T21:30:05.757Z" } ],