ENCODE-DCC / chip-seq-pipeline2

ENCODE ChIP-seq pipeline
MIT License
240 stars 123 forks source link

Error at chip.overlap_pr #147

Closed simingzhang closed 4 years ago

simingzhang commented 4 years ago

Describe the bug I'm using the newest pipeline version and updated caper within conda to 0.8.2. My script: conda activate encode-chip-seq-pipeline cd $OUTPUT_DIR sbatch -A aeurban -J chipseq --export=ALL --mem 3G -t 7-0 --wrap "caper run /labs/dflev/siming/software/chip-seq-pipeline2/chip.wdl -i $INPUT --out-dir $OUTPUT_DIR"

There is an error at chip.overlap_pr step. "chip.overlap_pr": [ { "retryableFailure": true, "executionStatus": "RetryableFailure",

OS/Platform

Caper configuration file Paste contents of ~/.caper/default.conf.

backend=slurm slurm-account=aeurban

tmp-dir=

cromwell=/home/simingz/.caper/cromwell_jar/cromwell-47.jar womtool=/home/simingz/.caper/womtool_jar/womtool-47.jar

Input JSON file Paste contents of your input JSON file.

{ "chip.title" : "chipseq miseq nano", "chip.description" : "chip-seq test miseq nano ipsc", "chip.pipeline_type" : "tf", "chip.aligner" : "bowtie2", "chip.align_only" : false, "chip.true_rep_only" : false, "chip.genome_tsv" : "/reference/ENCODE/pipeline_genome_data/genome_tsv/v1/hg38_scg.tsv", "chip.paired_end" : true, "chip.ctl_paired_end" : true, "chip.fastqs_rep1_R1" : [ "/labs/dflev/siming/chipseq/raw_fastq/test_miseqnano/0425-3-cetch1-flag_subsample1_R1.fastq.gz" ], "chip.fastqs_rep1_R2" : [ "/labs/dflev/siming/chipseq/raw_fastq/test_miseqnano/0425-3-cetch1-flag_subsample1_R2.fastq.gz" ], "chip.ctl_fastqs_rep1_R1" : [ "/labs/dflev/siming/chipseq/raw_fastq/test_miseqnano/0425-3-cetch1-input_subsample1_R1.fastq.gz" ], "chip.ctl_fastqs_rep1_R2" : [ "/labs/dflev/siming/chipseq/raw_fastq/test_miseqnano/0425-3-cetch1-input_subsample1_R2.fastq.gz" ], "chip.crop_length" : 0, "chip.mapq_thresh" : 30, "chip.dup_marker" : "picard", "chip.no_dup_removal" : false, "chip.subsample_reads" : 0, "chip.ctl_subsample_reads" : 0, "chip.xcor_subsample_reads" : 15000000, "chip.xcor_trim_bp" : 50, "chip.use_filt_pe_ta_for_xcor" : false, "chip.always_use_pooled_ctl" : false, "chip.ctl_depth_ratio" : 1.2, "chip.peak_caller" : null, "chip.cap_num_peak_macs2" : 500000, "chip.pval_thresh" : 0.01, "chip.fdr_thresh" : 0.01, "chip.idr_thresh" : 0.05, "chip.cap_num_peak_spp" : 300000, "chip.enable_jsd" : true, "chip.enable_gc_bias" : true, "chip.enable_count_signal_track" : false, "chip.filter_chrs" : [], "chip.align_cpu" : 4, "chip.align_mem_mb" : 20000, "chip.align_time_hr" : 144, "chip.align_disks" : "local-disk 400 HDD", "chip.filter_cpu" : 2, "chip.filter_mem_mb" : 20000, "chip.filter_time_hr" : 144, "chip.filter_disks" : "local-disk 400 HDD", "chip.bam2ta_cpu" : 2, "chip.bam2ta_mem_mb" : 10000, "chip.bam2ta_time_hr" : 144, "chip.bam2ta_disks" : "local-disk 100 HDD", "chip.spr_mem_mb" : 16000, "chip.jsd_cpu" : 2, "chip.jsd_mem_mb" : 12000, "chip.jsd_time_hr" : 144, "chip.jsd_disks" : "local-disk 200 HDD", "chip.xcor_cpu" : 2, "chip.xcor_mem_mb" : 16000, "chip.xcor_time_hr" : 144, "chip.xcor_disks" : "local-disk 100 HDD", "chip.call_peak_cpu" : 2, "chip.call_peak_mem_mb" : 16000, "chip.call_peak_time_hr" : 144, "chip.call_peak_disks" : "local-disk 200 HDD", "chip.macs2_signal_track_mem_mb" : 16000, "chip.macs2_signal_track_time_hr" : 144, "chip.macs2_signal_track_disks" : "local-disk 400 HDD", "chip.gc_bias_picard_java_heap" : "10G" }

Error log Caper automatically runs a troubleshooter for failed workflows. If it doesn't then get a WORKFLOW_ID of your failed workflow with caper list or directly use a metadata.json file on Caper's output directory.

$ caper debug [WORKFLOW_ID_OR_METADATA_JSON_FILE]

I can't get the caper debug or troubleshoot to work (get message. Help: cannot connect to server. Check if server is dead or still spinning up.). Instead, I attached the metadata.json file below:

metadata.txt

The failed parts are here: { "workflowName": "chip", "workflowProcessingEvents": [ { "cromwellId": "cromid-242b9e3", "description": "PickedUp", "timestamp": "2020-04-07T20:46:51.509Z", "cromwellVersion": "47" }, { "cromwellId": "cromid-242b9e3", "description": "Finished", "timestamp": "2020-04-07T21:31:11.658Z", "cromwellVersion": "47" } ], "actualWorkflowLanguageVersion": "draft-2", "submittedFiles": { "workflow": "# ENCODE TF/Histone ChIP-Seq pipeline\n# Author: Jin Lee (leepc12@gmail.com)\n\n#CAPER docker quay.io/encode-dcc/chip-seq-pipeline:v1.3.6\n#CAPER singularity docker://quay.io/encode-dcc/chip-seq-pipeline:v1.3.6\n#CROO out_def https://storage.googleapis.com/encode-pipeline-output-definition/chip.croo.v3.json\n\nworkflow chip {\n\tString pipeline_ver = 'v1.3.6'\n\t### sample name, description\n\tString title = 'Untitled'\n\tString description = 'No description'\n\n\t# endedness for input data\n\tBoolean? paired_end\t\t\t\t# to define endedness for all replciates\n\t\t\t\t\t\t\t\t\t#\tif defined, this will override individual endedness below\n\tArray[Boolean] paired_ends = []\t# to define endedness for individual replicate\n\tBoolean? ctl_paired_end\n\tArray[Boolean] ctl_paired_ends = []\n\n\t### mandatory genome param\n\tFile? genome_tsv \t\t\t\t# reference genome data TSV file including\n\t\t\t\t\t\t\t\t\t# all genome-specific file paths and parameters\n\t# individual genome parameters\n\tString? genome_name\t\t\t\t# genome name\n\tFile? ref_fa\t\t\t\t\t# reference fasta (*.fa.gz)\n\tFile? bwa_idx_tar \t\t\t\t# bwa index tar (uncompressed .tar)\n\tFile? bowtie2_idx_tar \t\t\t# bowtie2 index tar (uncompressed .tar)\n\tFile? custom_aligner_idx_tar \t# custom aligner's index tar (uncompressed .tar)\n\tFile? chrsz \t\t\t\t\t# 2-col chromosome sizes file\n\tFile? blacklist \t\t\t\t# blacklist BED (peaks overlapping will be filtered out)\n\tFile? blacklist2 \t\t\t\t# 2nd blacklist (will be merged with 1st one)\n\tString? mito_chr_name\n\tString? regex_bfilt_peak_chr_name\n\tString? gensz \t\t\t\t\t# genome sizes (hs for human, mm for mouse or sum of 2nd col in chrsz)\n\tFile? tss \t\t\t\t\t\t# TSS BED file\n\tFile? dnase \t\t\t\t\t# open chromatin region BED file\n\tFile? prom \t\t\t\t\t\t# promoter region BED file\n\tFile? enh \t\t\t\t\t\t# enhancer region BED file\n\tFile? reg2map \t\t\t\t\t# file with cell type signals\n\tFile? reg2map_bed \t\t\t\t# file of regions used to generate reg2map signals\n\tFile? roadmap_meta \t\t\t\t# roadmap metedata\n\n\t### pipeline type\n\tString pipeline_type \t\t\t# tf or histone chip-seq\n\n\tString aligner = 'bowtie2'\n\tFile? custom_align_py \t\t\t# custom align python script\n\n\tString? peak_caller \t\t\t# default: (spp for tf) and (macs2 for histone)\n\t\t\t\t\t\t\t\t\t# this will override the above defaults\n\tString? peak_type \t\t\t\t# default: narrowPeak for macs2, regionPeak for spp\n\t\t\t\t\t\t\t\t\t# this will override the above defaults\n\tFile? custom_call_peak_py \t\t# custom call_peak python script\n\n\t## parameters for alignment\n\tBoolean align_only = false \t\t# disable all post-align analysis (peak-calling, overlap, idr, ...)\n\tBoolean true_rep_only = false \t# disable all analyses involving pseudo replicates (including overlap/idr)\n\tBoolean enable_count_signal_track = false \t\t# generate count signal track\n\tBoolean enable_jsd = true \t\t# enable JSD plot generation (deeptools fingerprint)\n\tBoolean enable_gc_bias = true\n\n\t# parameters for aligner and filter\n\tBoolean use_bwa_mem_for_pe = false # THIS IS EXPERIMENTAL and BWA ONLY (use bwa mem instead of bwa aln/sam)\n\t\t\t\t\t\t\t\t\t# available only for PE dataset with READ_LEN>=70bp\n\tInt crop_length = 0 \t\t\t# crop reads in FASTQs with Trimmomatic (0 by default, i.e. disabled)\n\tInt xcor_trim_bp = 50 \t\t\t# for cross-correlation analysis only (R1 of paired-end fastqs)\n\tBoolean use_filt_pe_ta_for_xcor = false # PE only. use filtered PE BAM for cross-corr.\n\tString dup_marker = 'picard'\t# picard, sambamba\n\tBoolean no_dup_removal = false\t# keep all dups in final BAM\n\tInt? mapq_thresh \t\t\t\t# threshold for low MAPQ reads removal\n\tInt mapq_thresh_bwa = 30\n\tInt mapq_thresh_bowtie2 = 30\n\tArray[String] filter_chrs = [] \t# array of chromosomes to be removed from nodup/filt BAM\n\t\t\t\t\t\t\t\t\t# chromosomes will be removed from both BAM header/contents\n\t\t\t\t\t\t\t\t\t# e.g. ['chrM', 'MT']\n\tInt subsample_reads = 0\t\t\t# number of reads to subsample TAGALIGN\n\t\t\t\t\t\t\t\t\t# 0 for no subsampling. this affects all downstream analysis\n\tInt ctl_subsample_reads = 0\t\t# number of reads to subsample control TAGALIGN\n\tInt xcor_subsample_reads = 15000000 # subsample TAG-ALIGN for xcor only (not used for other downsteam analyses)\n\tInt xcor_exclusion_range_min = -500\n\tInt? xcor_exclusion_range_max\n\n\t# parameters for peak calling\n\tBoolean always_use_pooled_ctl = false # always use pooled control for all exp rep.\n\tFloat ctl_depth_ratio = 1.2 \t# if ratio between controls is higher than this\n\t\t\t\t\t\t\t\t\t# then always use pooled control for all exp rep.\n\tInt? cap_num_peak\n\tInt cap_num_peak_spp = 300000\t# cap number of raw peaks called from SPP\n\tInt cap_num_peak_macs2 = 500000\t# cap number of raw peaks called from MACS2\n\tFloat pval_thresh = 0.01\t\t# p.value threshold (for MACS2 peak caller only)\n\tFloat fdr_thresh = 0.01\t\t\t# FDR threshold (for SPP peak caller only: Rscript run_spp.R -fdr)\n\tFloat idr_thresh = 0.05\t\t\t# IDR threshold\n\n\t### resources\n\tInt align_cpu = 4\n\tInt align_mem_mb = 20000\n\tInt align_time_hr = 48\n\tString align_disks = 'local-disk 400 HDD'\n\n\tInt filter_cpu = 2\n\tInt filter_mem_mb = 20000\n\tInt filter_time_hr = 24\n\tString filter_disks = 'local-disk 400 HDD'\n\n\tInt bam2ta_cpu = 2\n\tInt bam2ta_mem_mb = 10000\n\tInt bam2ta_time_hr = 6\n\tString bam2ta_disks = 'local-disk 100 HDD'\n\n\tInt spr_mem_mb = 16000\n\n\tInt jsd_cpu = 2\n\tInt jsd_mem_mb = 12000\n\tInt jsd_time_hr = 6\n\tString jsd_disks = 'local-disk 200 HDD'\n\n\tInt xcor_cpu = 2\n\tInt xcor_mem_mb = 16000\t\n\tInt xcor_time_hr = 24\n\tString xcor_disks = 'local-disk 100 HDD'\n\n\tInt macs2_signal_track_mem_mb = 16000\n\tInt macs2_signal_track_time_hr = 24\n\tString macs2_signal_track_disks = 'local-disk 400 HDD'\n\n\tInt call_peak_cpu = 2\n\tInt call_peak_mem_mb = 16000\n\tInt call_peak_time_hr = 72\n\tString call_peak_disks = 'local-disk 200 HDD'\n\n\tString? align_trimmomatic_java_heap\n\tString? filter_picard_java_heap\n\tString? gc_bias_picard_java_heap\n\n\t#### input file definition\n\t# pipeline can start from any type of inputs and then leave all other types undefined\n\t# supported types: fastq, bam, nodup_bam (filtered bam), ta (tagAlign), peak\n\t# define up to 4 replicates\n\t# [rep_id] is for each replicate\n\n \t### fastqs\n\tArray[File] fastqs_rep1_R1 = []\t\t# FASTQs to be merged for rep1 R1\n\tArray[File] fastqs_rep1_R2 = [] \t# do not define if single-ended\n\tArray[File] fastqs_rep2_R1 = [] \t# do not define if unreplicated\n\tArray[File] fastqs_rep2_R2 = []\t\t# ...\n\tArray[File] fastqs_rep3_R1 = []\n\tArray[File] fastqs_rep3_R2 = []\n\tArray[File] fastqs_rep4_R1 = []\n\tArray[File] fastqs_rep4_R2 = []\n\tArray[File] fastqs_rep5_R1 = []\n\tArray[File] fastqs_rep5_R2 = []\n\tArray[File] fastqs_rep6_R1 = []\n\tArray[File] fastqs_rep6_R2 = []\n\tArray[File] fastqs_rep7_R1 = []\n\tArray[File] fastqs_rep7_R2 = []\n\tArray[File] fastqs_rep8_R1 = []\n\tArray[File] fastqs_rep8_R2 = []\n\tArray[File] fastqs_rep9_R1 = []\n\tArray[File] fastqs_rep9_R2 = []\n\tArray[File] fastqs_rep10_R1 = []\n\tArray[File] fastqs_rep10_R2 = []\n\n\tArray[File] ctl_fastqs_rep1_R1 = []\t\t# Control FASTQs to be merged for rep1 R1\n\tArray[File] ctl_fastqs_rep1_R2 = [] \t# do not define if single-ended\n\tArray[File] ctl_fastqs_rep2_R1 = [] \t# do not define if unreplicated\n\tArray[File] ctl_fastqs_rep2_R2 = []\t\t# ...\n\tArray[File] ctl_fastqs_rep3_R1 = []\n\tArray[File] ctl_fastqs_rep3_R2 = []\n\tArray[File] ctl_fastqs_rep4_R1 = []\n\tArray[File] ctl_fastqs_rep4_R2 = []\n\tArray[File] ctl_fastqs_rep5_R1 = []\n\tArray[File] ctl_fastqs_rep5_R2 = []\n\tArray[File] ctl_fastqs_rep6_R1 = []\n\tArray[File] ctl_fastqs_rep6_R2 = []\n\tArray[File] ctl_fastqs_rep7_R1 = []\n\tArray[File] ctl_fastqs_rep7_R2 = []\n\tArray[File] ctl_fastqs_rep8_R1 = []\n\tArray[File] ctl_fastqs_rep8_R2 = []\n\tArray[File] ctl_fastqs_rep9_R1 = []\n\tArray[File] ctl_fastqs_rep9_R2 = []\n\tArray[File] ctl_fastqs_rep10_R1 = []\n\tArray[File] ctl_fastqs_rep10_R2 = []\n\n\t### other input types (bam, nodup_bam, ta)\n\tArray[File?] bams = [] \t\t\t# [rep_id]\n\tArray[File?] ctl_bams = [] \t\t# [rep_id]\n\tArray[File?] nodup_bams = [] \t# [rep_id]\n\tArray[File?] ctl_nodup_bams = [] # [rep_id]\n\tArray[File?] tas = []\t\t\t# [rep_id]\n\tArray[File?] ctl_tas = []\t\t# [rep_id]\n\n\t### other input types (peak)\n\tArray[Int?] fraglen = [] \t# [rep_id]. fragment length if inputs are peaks\n\tArray[File?] peaks = []\t\t# [PAIR(rep_id1,rep_id2)]. example for 3 reps: [rep1_rep2, rep1_rep3, rep2_rep3]\n\tArray[File?] peaks_pr1 = []\t# [rep_id]. do not define if true_rep=true\n\tArray[File?] peaks_pr2 = []\t# [rep_id]. do not define if true_rep=true\n\tFile? peak_ppr1\t\t\t\t# do not define if you have a single replicate or true_rep=true\n\tFile? peak_ppr2\t\t\t\t# do not define if you have a single replicate or true_rep=true\n\tFile? peak_pooled\t\t\t# do not define if you have a single replicate or true_rep=true\n\n\t####################### pipeline starts here #######################\n\t# DO NOT DEFINE ANY VARIABLES DECLARED BELOW IN AN INPUT JSON FILE #\n\t# THEY ARE TEMPORARY/INTERMEDIATE SYSTEM VARIABLES #\n\t####################### pipeline starts here #######################\n\n\t# read genome data and paths\n\tif ( defined(genome_tsv) ) {\n\t\tcall read_genome_tsv { input: genome_tsv = genome_tsv }\n\t}\n\tFile? reffa = if defined(ref_fa) then ref_fa\n\t\telse read_genome_tsv.ref_fa\n\tFile? bwa_idxtar = if defined(bwa_idx_tar) then bwa_idx_tar\n\t\telse read_genome_tsv.bwa_idx_tar\n\tFile? bowtie2_idxtar = if defined(bowtie2_idx_tar) then bowtie2_idx_tar\n\t\telse read_genome_tsv.bowtie2_idx_tar\n\tFile? custom_aligner_idxtar = if defined(custom_aligner_idx_tar) then custom_aligner_idx_tar\n\t\telse read_genome_tsv.custom_aligner_idxtar\n\tFile? chrsz = if defined(chrsz) then chrsz\n\t\telse read_genometsv.chrsz\n\tString? gensz = if defined(gensz) then gensz\n\t\telse read_genometsv.gensz\n\tFile? blacklist1 = if defined(blacklist) then blacklist\n\t\telse read_genometsv.blacklist\n\tFile? blacklist2 = if defined(blacklist2) then blacklist2\n\t\telse read_genome_tsv.blacklist2\n\t# merge multiple blacklists\n\t# two blacklists can have different number of columns (3 vs 6)\n\t# so we limit merged blacklist's columns to 3\n\tArray[File] blacklists = selectall([blacklist1, blacklist2_])\n\tif ( length(blacklists) > 1 ) {\n\t\tcall pool_ta as poolblacklist { input:\n\t\t\ttas = blacklists,\n\t\t\tcol = 3,\n\t\t}\n\t}\n\tFile? blacklist = if length(blacklists) > 1 then pool_blacklist.tapooled\n\t\telse if length(blacklists) > 0 then blacklists[0]\n\t\telse blacklist2\n\tString? mito_chrname = if defined(mito_chr_name) then mito_chr_name\n\t\telse read_genome_tsv.mito_chr_name\t\t\n\tString? regex_bfilt_peak_chrname = if defined(regex_bfilt_peak_chr_name) then regex_bfilt_peak_chr_name\n\t\telse read_genome_tsv.regex_bfilt_peak_chr_name\n\tString? genomename = if defined(genome_name) then genome_name\n\t\telse if defined(read_genome_tsv.genome_name) then read_genome_tsv.genome_name\n\t\telse basename(select_first([genome_tsv, reffa, chrsz, 'None']))\n\n\t# read additional annotation data\n\tFile? tss = if defined(tss) then tss\n\t\telse read_genometsv.tss\n\tFile? dnase = if defined(dnase) then dnase\n\t\telse read_genometsv.dnase\n\tFile? prom = if defined(prom) then prom\n\t\telse read_genometsv.prom\n\tFile? enh = if defined(enh) then enh\n\t\telse read_genometsv.enh\n\tFile? reg2map = if defined(reg2map) then reg2map\n\t\telse read_genome_tsv.reg2map\n\tFile? reg2mapbed = if defined(reg2map_bed) then reg2map_bed\n\t\telse read_genome_tsv.reg2map_bed\n\tFile? roadmapmeta = if defined(roadmap_meta) then roadmap_meta\n\t\telse read_genome_tsv.roadmapmeta\n\n\t### temp vars (do not define these)\n\tString aligner = if defined(custom_align_py) then 'custom' else aligner\n\tString peakcaller = if defined(custom_call_peak_py) then 'custom'\n\t\t\t\t\t\telse if pipeline_type=='tf' then select_first([peak_caller, 'spp'])\n\t\t\t\t\t\telse select_first([peak_caller, 'macs2'])\n\tString peaktype = if peakcaller=='spp' then select_first([peak_type, 'regionPeak'])\n\t\t\t\t\t\telse if peakcaller=='macs2' then select_first([peak_type, 'narrowPeak'])\n\t\t\t\t\t\telse select_first([peak_type, 'narrowPeak'])\n\tBoolean enable_idr = pipeline_type=='tf' # enable_idr for TF chipseq only\n\tString idrrank = if peakcaller=='spp' then 'signal.value'\n\t\t\t\t\t\telse if peakcaller=='macs2' then 'p.value'\n\t\t\t\t\t\telse 'p.value'\n\tInt cap_numpeak = if peakcaller == 'spp' then select_first([cap_num_peak, cap_num_peak_spp])\n\t\telse select_first([cap_num_peak, cap_num_peak_macs2])\n\tInt mapqthresh = if aligner=='bowtie2' then select_first([mapq_thresh, mapq_thresh_bowtie2])\n\t\t\t\t\t\telse select_first([mapq_thresh, mapq_thresh_bwa])\n\n\t# temporary 2-dim fastqs array [rep_id][merge_id]\n\tArray[Array[File]] fastqs_R1 = \n\t\tif length(fastqs_rep10_R1)>0 then\n\t\t\t[fastqs_rep1_R1, fastqs_rep2_R1, fastqs_rep3_R1, fastqs_rep4_R1, fastqs_rep5_R1,\n\t\t\tfastqs_rep6_R1, fastqs_rep7_R1, fastqs_rep8_R1, fastqs_rep9_R1, fastqs_rep10_R1]\n\t\telse if length(fastqs_rep9_R1)>0 then\n\t\t\t[fastqs_rep1_R1, fastqs_rep2_R1, fastqs_rep3_R1, fastqs_rep4_R1, fastqs_rep5_R1,\n\t\t\tfastqs_rep6_R1, fastqs_rep7_R1, fastqs_rep8_R1, fastqs_rep9_R1]\n\t\telse if length(fastqs_rep8_R1)>0 then\n\t\t\t[fastqs_rep1_R1, fastqs_rep2_R1, fastqs_rep3_R1, fastqs_rep4_R1, fastqs_rep5_R1,\n\t\t\tfastqs_rep6_R1, fastqs_rep7_R1, fastqs_rep8_R1]\n\t\telse if length(fastqs_rep7_R1)>0 then\n\t\t\t[fastqs_rep1_R1, fastqs_rep2_R1, fastqs_rep3_R1, fastqs_rep4_R1, fastqs_rep5_R1,\n\t\t\tfastqs_rep6_R1, fastqs_rep7_R1]\n\t\telse if length(fastqs_rep6_R1)>0 then\n\t\t\t[fastqs_rep1_R1, fastqs_rep2_R1, fastqs_rep3_R1, fastqs_rep4_R1, fastqs_rep5_R1,\n\t\t\tfastqs_rep6_R1]\n\t\telse if length(fastqs_rep5_R1)>0 then\n\t\t\t[fastqs_rep1_R1, fastqs_rep2_R1, fastqs_rep3_R1, fastqs_rep4_R1, fastqs_rep5_R1]\n\t\telse if length(fastqs_rep4_R1)>0 then\n\t\t\t[fastqs_rep1_R1, fastqs_rep2_R1, fastqs_rep3_R1, fastqs_rep4_R1]\n\t\telse if length(fastqs_rep3_R1)>0 then\n\t\t\t[fastqs_rep1_R1, fastqs_rep2_R1, fastqs_rep3_R1]\n\t\telse if length(fastqs_rep2_R1)>0 then\n\t\t\t[fastqs_rep1_R1, fastqs_rep2_R1]\n\t\telse if length(fastqs_rep1_R1)>0 then\n\t\t\t[fastqs_rep1_R1]\n\t\telse []\n\t# no need to do that for R2 (R1 array will be used to determine presense of fastq for each rep)\n\tArray[Array[File]] fastqs_R2 = \n\t\t[fastqs_rep1_R2, fastqs_rep2_R2, fastqs_rep3_R2, fastqs_rep4_R2, fastqs_rep5_R2,\n\t\tfastqs_rep6_R2, fastqs_rep7_R2, fastqs_rep8_R2, fastqs_rep9_R2, fastqs_rep10_R2]\n\n\t# temporary 2-dim ctl fastqs array [rep_id][merge_id]\n\tArray[Array[File]] ctl_fastqs_R1 = \n\t\tif length(ctl_fastqs_rep10_R1)>0 then\n\t\t\t[ctl_fastqs_rep1_R1, ctl_fastqs_rep2_R1, ctl_fastqs_rep3_R1, ctl_fastqs_rep4_R1, ctl_fastqs_rep5_R1,\n\t\t\tctl_fastqs_rep6_R1, ctl_fastqs_rep7_R1, ctl_fastqs_rep8_R1, ctl_fastqs_rep9_R1, ctl_fastqs_rep10_R1]\n\t\telse if length(ctl_fastqs_rep9_R1)>0 then\n\t\t\t[ctl_fastqs_rep1_R1, ctl_fastqs_rep2_R1, ctl_fastqs_rep3_R1, ctl_fastqs_rep4_R1, ctl_fastqs_rep5_R1,\n\t\t\tctl_fastqs_rep6_R1, ctl_fastqs_rep7_R1, ctl_fastqs_rep8_R1, ctl_fastqs_rep9_R1]\n\t\telse if length(ctl_fastqs_rep8_R1)>0 then\n\t\t\t[ctl_fastqs_rep1_R1, ctl_fastqs_rep2_R1, ctl_fastqs_rep3_R1, ctl_fastqs_rep4_R1, ctl_fastqs_rep5_R1,\n\t\t\tctl_fastqs_rep6_R1, ctl_fastqs_rep7_R1, ctl_fastqs_rep8_R1]\n\t\telse if length(ctl_fastqs_rep7_R1)>0 then\n\t\t\t[ctl_fastqs_rep1_R1, ctl_fastqs_rep2_R1, ctl_fastqs_rep3_R1, ctl_fastqs_rep4_R1, ctl_fastqs_rep5_R1,\n\t\t\tctl_fastqs_rep6_R1, ctl_fastqs_rep7_R1]\n\t\telse if length(ctl_fastqs_rep6_R1)>0 then\n\t\t\t[ctl_fastqs_rep1_R1, ctl_fastqs_rep2_R1, ctl_fastqs_rep3_R1, ctl_fastqs_rep4_R1, ctl_fastqs_rep5_R1,\n\t\t\tctl_fastqs_rep6_R1]\n\t\telse if length(ctl_fastqs_rep5_R1)>0 then\n\t\t\t[ctl_fastqs_rep1_R1, ctl_fastqs_rep2_R1, ctl_fastqs_rep3_R1, ctl_fastqs_rep4_R1, ctl_fastqs_rep5_R1]\n\t\telse if length(ctl_fastqs_rep4_R1)>0 then\n\t\t\t[ctl_fastqs_rep1_R1, ctl_fastqs_rep2_R1, ctl_fastqs_rep3_R1, ctl_fastqs_rep4_R1]\n\t\telse if length(ctl_fastqs_rep3_R1)>0 then\n\t\t\t[ctl_fastqs_rep1_R1, ctl_fastqs_rep2_R1, ctl_fastqs_rep3_R1]\n\t\telse if length(ctl_fastqs_rep2_R1)>0 then\n\t\t\t[ctl_fastqs_rep1_R1, ctl_fastqs_rep2_R1]\n\t\telse if length(ctl_fastqs_rep1_R1)>0 then\n\t\t\t[ctl_fastqs_rep1_R1]\n\t\telse []\n\t# no need to do that for R2 (R1 array will be used to determine presense of fastq for each rep)\n\tArray[Array[File]] ctl_fastqs_R2 = \n\t\t[ctl_fastqs_rep1_R2, ctl_fastqs_rep2_R2, ctl_fastqs_rep3_R2, ctl_fastqs_rep4_R2, ctl_fastqs_rep5_R2,\n\t\tctl_fastqs_rep6_R2, ctl_fastqs_rep7_R2, ctl_fastqs_rep8_R2, ctl_fastqs_rep9_R2, ctl_fastqs_rep10_R2]\n\n\t# temporary variables to get number of replicates\n\t# \tWDLic implementation of max(A,B,C,...)\n\tInt num_rep_fastq = length(fastqs_R1)\n\tInt num_rep_bam = if length(bams)<num_rep_fastq then num_rep_fastq\n\t\telse length(bams)\n\tInt num_rep_nodup_bam = if length(nodup_bams)<num_rep_bam then num_rep_bam\n\t\telse length(nodup_bams)\n\tInt num_rep_ta = if length(tas)<num_rep_nodup_bam then num_rep_nodup_bam\n\t\telse length(tas)\n\tInt num_rep_peak = if length(peaks)<num_rep_ta then num_rep_ta\n\t\telse length(peaks)\n\tInt num_rep = num_rep_peak\n\n\t# temporary variables to get number of controls\n\tInt num_ctl_fastq = length(ctl_fastqs_R1)\n\tInt num_ctl_bam = if length(ctl_bams)<num_ctl_fastq then num_ctl_fastq\n\t\telse length(ctl_bams)\n\tInt num_ctl_nodup_bam = if length(ctl_nodup_bams)<num_ctl_bam then num_ctl_bam\n\t\telse length(ctl_nodup_bams)\n\tInt num_ctl_ta = if length(ctl_tas)<num_ctl_nodup_bam then num_ctl_nodup_bam\n\t\telse length(ctl_tas)\n\tInt num_ctl = num_ctl_ta\n\n\t# sanity check for inputs\n\tif ( num_rep == 0 && num_ctl == 0 ) {\n\t\tcall raise_exception as error_input_data { input:\n\t\t\tmsg = 'No FASTQ/BAM/TAG-ALIGN/PEAK defined in your input JSON. Check if your FASTQs are defined as \"chip.fastqs_repX_RY\". DO NOT MISS suffix R1 even for single ended FASTQ.'\n\t\t}\n\t}\n\tif ( !defined(chrsz) ) {\n\t\tcall raise_exception as error_genome_database { input:\n\t\t\tmsg = 'No genome database found in your input JSON. Did you define \"chip.genome_tsv\" correctly?'\n\t\t}\n\t}\n\tif ( !align_only && peakcaller == 'spp' && num_ctl == 0 ) {\n\t\tcall raise_exception as error_controlrequired { input:\n\t\t\tmsg = 'SPP requires control inputs. Define control input files (\"chip.ctl*\") in an input JSON file.'\n\t\t}\n\t}\n\n\t# align each replicate\n\tscatter(i in range(num_rep)) {\n\t\t# to override endedness definition for individual replicate\n\t\t# \tpaired_end will override paired_ends[i]\n\t\tBoolean? pairedend = if !defined(paired_end) && i<length(paired_ends) then paired_ends[i]\n\t\t\telse paired_end\n\n\t\tBoolean has_input_of_align = i<length(fastqs_R1) && length(fastqs_R1[i])>0\n\t\tBoolean has_output_of_align = i<length(bams) && defined(bams[i])\n\t\tif ( has_input_of_align && !has_output_of_align ) {\n\t\t\tcall align { input :\n\t\t\t\tfastqs_R1 = fastqs_R1[i],\n\t\t\t\tfastqs_R2 = fastqs_R2[i],\n\t\t\t\tcrop_length = croplength,\n\n\t\t\t\taligner = aligner,\n\t\t\t\tmito_chr_name = mito_chrname,\n\t\t\t\tcustom_align_py = custom_align_py,\n\t\t\t\tidx_tar = if aligner=='bwa' then bwa_idxtar\n\t\t\t\t\telse if aligner=='bowtie2' then bowtie2_idxtar\n\t\t\t\t\telse custom_aligner_idxtar,\n\t\t\t\tpaired_end = pairedend,\n\t\t\t\tuse_bwa_mem_for_pe = use_bwa_mem_for_pe,\n\n\t\t\t\ttrimmomatic_java_heap = align_trimmomatic_java_heap,\n\t\t\t\tcpu = align_cpu,\n\t\t\t\tmem_mb = align_mem_mb,\n\t\t\t\ttime_hr = align_time_hr,\n\t\t\t\tdisks = aligndisks,\n\t\t\t}\n\t\t}\n\t\tFile? bam = if has_output_of_align then bams[i] else align.bam\n\n\t\tBoolean has_input_of_filter = has_output_of_align || defined(align.bam)\n\t\tBoolean has_output_of_filter = i<length(nodup_bams) && defined(nodup_bams[i])\n\t\t# skip if we already have output of this step\n\t\tif ( has_input_of_filter && !has_output_offilter ) {\n\t\t\tcall filter { input :\n\t\t\t\tbam = bam,\n\t\t\t\tpaired_end = pairedend,\n\t\t\t\tdup_marker = dup_marker,\n\t\t\t\tmapq_thresh = mapqthresh,\n\t\t\t\tfilter_chrs = filterchrs,\n\t\t\t\tchrsz = chrsz,\n\t\t\t\tno_dup_removal = no_dup_removal,\n\t\t\t\tmito_chr_name = mito_chrname,\n\n\t\t\t\tcpu = filter_cpu,\n\t\t\t\tmem_mb = filter_mem_mb,\n\t\t\t\tpicard_java_heap = filter_picard_java_heap,\n\t\t\t\ttime_hr = filter_time_hr,\n\t\t\t\tdisks = filter_disks,\n\t\t\t}\n\t\t}\n\t\tFile? nodupbam = if has_output_of_filter then nodup_bams[i] else filter.nodup_bam\n\n\t\tBoolean has_input_of_bam2ta = has_output_of_filter || defined(filter.nodup_bam)\n\t\tBoolean has_output_of_bam2ta = i<length(tas) && defined(tas[i])\n\t\tif ( has_input_of_bam2ta && !has_output_of_bam2ta ) {\n\t\t\tcall bam2ta { input :\n\t\t\t\tbam = nodupbam,\n\t\t\t\tsubsample = subsample_reads,\n\t\t\t\tpaired_end = pairedend,\n\t\t\t\tmito_chr_name = mito_chrname,\n\n\t\t\t\tcpu = bam2ta_cpu,\n\t\t\t\tmem_mb = bam2ta_mem_mb,\n\t\t\t\ttime_hr = bam2ta_time_hr,\n\t\t\t\tdisks = bam2tadisks,\n\t\t\t}\n\t\t}\n\t\tFile? ta = if has_output_of_bam2ta then tas[i] else bam2ta.ta\n\n\t\tBoolean has_input_of_spr = has_output_of_bam2ta || defined(bam2ta.ta)\n\t\tif ( has_input_of_spr && !align_only && !true_reponly ) {\n\t\t\tcall spr { input :\n\t\t\t\tta = ta,\n\t\t\t\tpaired_end = pairedend,\n\t\t\t\tmem_mb = spr_mem_mb,\n\t\t\t}\n\t\t}\n\n\t\tBoolean has_input_of_count_signal_track = has_output_of_bam2ta || defined(bam2ta.ta)\n\t\tif ( has_input_of_count_signal_track && enable_count_signal_track ) {\n\t\t\t# generate count signal track\n\t\t\tcall count_signaltrack { input :\n\t\t\t\tta = ta,\n\t\t\t\tchrsz = chrsz_,\n\t\t\t}\n\t\t}\n\n\t\tif ( enable_gc_bias && defined(nodupbam) && defined(reffa) ) {\n\t\t\tcall gc_bias { input :\n\t\t\t\tnodup_bam = nodupbam,\n\t\t\t\tref_fa = reffa,\n\t\t\t\tpicard_java_heap = gc_bias_picard_java_heap,\n\t\t\t}\n\t\t}\n\n\t\t# special trimming/mapping for xcor (when starting from FASTQs)\n\t\tif ( has_input_of_align ) {\n\t\t\tcall align as align_R1 { input :\n\t\t\t\tfastqs_R1 = fastqs_R1[i],\n\t\t\t\tfastqs_R2 = [],\n\t\t\t\ttrim_bp = xcor_trim_bp,\n\t\t\t\tcroplength = 0,\n\n\t\t\t\taligner = aligner,\n\t\t\t\tmito_chr_name = mito_chrname,\n\t\t\t\tcustom_align_py = custom_align_py,\n\t\t\t\tidx_tar = if aligner=='bwa' then bwa_idxtar\n\t\t\t\t\telse if aligner=='bowtie2' then bowtie2_idxtar\n\t\t\t\t\telse custom_aligner_idxtar,\n\t\t\t\tpaired_end = false,\n\t\t\t\tuse_bwa_mem_for_pe = use_bwa_mem_for_pe,\n\n\t\t\t\tcpu = align_cpu,\n\t\t\t\tmem_mb = align_mem_mb,\n\t\t\t\ttime_hr = align_time_hr,\n\t\t\t\tdisks = align_disks,\n\t\t\t}\n\t\t\t# no bam deduping for xcor\n\t\t\tcall filter as filter_R1 { input :\n\t\t\t\tbam = align_R1.bam,\n\t\t\t\tpaired_end = false,\n\t\t\t\tdup_marker = dup_marker,\n\t\t\t\tmapq_thresh = mapqthresh,\n\t\t\t\tfilter_chrs = filterchrs,\n\t\t\t\tchrsz = chrsz,\n\t\t\t\tno_dup_removal = true,\n\t\t\t\tmito_chr_name = mito_chrname,\n\n\t\t\t\tcpu = filter_cpu,\n\t\t\t\tmem_mb = filter_mem_mb,\n\t\t\t\tpicard_java_heap = filter_picard_java_heap,\n\t\t\t\ttime_hr = filter_time_hr,\n\t\t\t\tdisks = filter_disks,\n\t\t\t}\n\t\t\tcall bam2ta as bam2ta_no_dedup_R1 { input :\n\t\t\t\tbam = filter_R1.nodup_bam, # it's named as nodup bam but it's not deduped but just filtered\n\t\t\t\tpaired_end = false,\n\t\t\t\tsubsample = 0,\n\t\t\t\tmito_chr_name = mito_chrname,\n\n\t\t\t\tcpu = bam2ta_cpu,\n\t\t\t\tmem_mb = bam2ta_mem_mb,\n\t\t\t\ttime_hr = bam2ta_time_hr,\n\t\t\t\tdisks = bam2ta_disks,\n\t\t\t}\n\t\t}\n\n\t\t# special trimming/mapping for xcor (when starting from BAMs)\n\t\tBoolean has_input_of_bam2ta_no_dedup = (has_output_of_align || defined(align.bam))\n\t\t\t&& !defined(bam2ta_no_dedup_R1.ta)\n\t\tif ( has_input_of_bam2ta_no_dedup ) {\n\t\t\tcall filter as filter_nodedup { input :\n\t\t\t\tbam = bam,\n\t\t\t\tpaired_end = pairedend,\n\t\t\t\tdup_marker = dup_marker,\n\t\t\t\tmapq_thresh = mapqthresh,\n\t\t\t\tfilter_chrs = filterchrs,\n\t\t\t\tchrsz = chrsz,\n\t\t\t\tno_dup_removal = true,\n\t\t\t\tmito_chr_name = mito_chrname,\n\n\t\t\t\tcpu = filter_cpu,\n\t\t\t\tmem_mb = filter_mem_mb,\n\t\t\t\tpicard_java_heap = filter_picard_java_heap,\n\t\t\t\ttime_hr = filter_time_hr,\n\t\t\t\tdisks = filter_disks,\n\t\t\t}\n\t\t\tcall bam2ta as bam2ta_no_dedup { input :\n\t\t\t\tbam = filter_no_dedup.nodup_bam, # output name is nodup but it's not deduped\n\t\t\t\tpaired_end = pairedend,\n\t\t\t\tsubsample = 0,\n\t\t\t\tmito_chr_name = mito_chrname,\n\n\t\t\t\tcpu = bam2ta_cpu,\n\t\t\t\tmem_mb = bam2ta_mem_mb,\n\t\t\t\ttime_hr = bam2ta_time_hr,\n\t\t\t\tdisks = bam2ta_disks,\n\t\t\t}\n\t\t}\n\n\t\t# use trimmed/unfilitered R1 tagAlign for paired end dataset \t\t\n\t\t# if not starting from fastqs, keep using old method\n\t\t# (mapping with both ends for tag-aligns to be used for xcor)\n\t\t# subsample tagalign (non-mito) and cross-correlation analysis\n\t\tFile? ta_xcor = if defined(bam2ta_no_dedup_R1.ta) then bam2ta_no_dedup_R1.ta\n\t\t\telse if defined(bam2ta_no_dedup.ta) then bam2ta_nodedup.ta\n\t\t\telse ta\n\t\tBoolean? paired_end_xcor = if defined(bam2ta_no_dedup_R1.ta) then false\n\t\t\telse pairedend\n\n\t\tBoolean has_input_of_xcor = defined(ta_xcor)\n\t\tif ( has_input_of_xcor ) {\n\t\t\tcall xcor { input :\n\t\t\t\tta = ta_xcor,\n\t\t\t\tpaired_end = paired_end_xcor,\n\t\t\t\tsubsample = xcor_subsample_reads,\n\t\t\t\tmito_chr_name = mito_chrname,\n\t\t\t\tchip_seq_type = pipeline_type,\n\t\t\t\texclusion_range_min = xcor_exclusion_range_min,\n\t\t\t\texclusion_range_max = xcor_exclusion_range_max,\n\t\t\t\tcpu = xcor_cpu,\n\t\t\t\tmem_mb = xcor_mem_mb,\n\t\t\t\ttime_hr = xcor_time_hr,\n\t\t\t\tdisks = xcordisks,\n\t\t\t}\n\t\t}\n\n\t\t# before peak calling, get fragment length from xcor analysis or given input\n\t\t# if fraglen [] is defined in the input JSON, fraglen from xcor will be ignored\n\t\tInt? fraglen = if i<length(fraglen) && defined(fraglen[i]) then fraglen[i]\n\t\t\telse xcor.fraglen\n\t}\n\n\t# align each control\n\tscatter(i in range(num_ctl)) {\n\t\t# to override endedness definition for individual control\n\t\t# \tctl_paired_end will override ctl_paired_ends[i]\n\t\tBoolean? ctl_pairedend = if !defined(ctl_paired_end) && i<length(ctl_paired_ends) then ctl_paired_ends[i]\n\t\t\telse if defined(ctl_paired_end) then ctl_paired_end\n\t\t\telse paired_end\n\n\t\tBoolean has_input_of_align_ctl = i<length(ctl_fastqs_R1) && length(ctl_fastqs_R1[i])>0\n\t\tBoolean has_output_of_align_ctl = i<length(ctl_bams) && defined(ctl_bams[i])\n\t\tif ( has_input_of_align_ctl && !has_output_of_align_ctl ) {\n\t\t\tcall align as align_ctl { input :\n\t\t\t\tfastqs_R1 = ctl_fastqs_R1[i],\n\t\t\t\tfastqs_R2 = ctl_fastqs_R2[i],\n\t\t\t\tcrop_length = croplength,\n\n\t\t\t\taligner = aligner,\n\t\t\t\tmito_chr_name = mito_chrname,\n\t\t\t\tcustom_align_py = custom_align_py,\n\t\t\t\tidx_tar = if aligner=='bwa' then bwa_idxtar\n\t\t\t\t\telse if aligner=='bowtie2' then bowtie2_idxtar\n\t\t\t\t\telse custom_aligner_idxtar,\n\t\t\t\tpaired_end = ctl_pairedend,\n\t\t\t\tuse_bwa_mem_for_pe = use_bwa_mem_for_pe,\n\n\t\t\t\ttrimmomatic_java_heap = align_trimmomatic_java_heap,\n\t\t\t\tcpu = align_cpu,\n\t\t\t\tmem_mb = align_mem_mb,\n\t\t\t\ttime_hr = align_time_hr,\n\t\t\t\tdisks = align_disks,\n\t\t\t}\n\t\t}\n\t\tFile? ctlbam = if has_output_of_align_ctl then ctl_bams[i] else align_ctl.bam\n\n\t\tBoolean has_input_of_filter_ctl = has_output_of_align_ctl || defined(align_ctl.bam)\n\t\tBoolean has_output_of_filter_ctl = i<length(ctl_nodup_bams) && defined(ctl_nodup_bams[i])\n\t\t# skip if we already have output of this step\n\t\tif ( has_input_of_filter_ctl && !has_output_of_filter_ctl ) {\n\t\t\tcall filter as filter_ctl { input :\n\t\t\t\tbam = ctlbam,\n\t\t\t\tpaired_end = ctl_pairedend,\n\t\t\t\tdup_marker = dup_marker,\n\t\t\t\tmapq_thresh = mapqthresh,\n\t\t\t\tfilter_chrs = filterchrs,\n\t\t\t\tchrsz = chrsz,\n\t\t\t\tno_dup_removal = no_dup_removal,\n\t\t\t\tmito_chr_name = mito_chrname,\n\n\t\t\t\tcpu = filter_cpu,\n\t\t\t\tmem_mb = filter_mem_mb,\n\t\t\t\tpicard_java_heap = filter_picard_java_heap,\n\t\t\t\ttime_hr = filter_time_hr,\n\t\t\t\tdisks = filter_disks,\n\t\t\t}\n\t\t}\n\t\tFile? ctl_nodupbam = if has_output_of_filter_ctl then ctl_nodup_bams[i] else filter_ctl.nodup_bam\n\n\t\tBoolean has_input_of_bam2ta_ctl = has_output_of_filter_ctl || defined(filter_ctl.nodup_bam)\n\t\tBoolean has_output_of_bam2ta_ctl = i<length(ctl_tas) && defined(ctl_tas[i])\n\t\tif ( has_input_of_bam2ta_ctl && !has_output_of_bam2ta_ctl ) {\n\t\t\tcall bam2ta as bam2ta_ctl { input :\n\t\t\t\tbam = ctl_nodupbam,\n\t\t\t\tsubsample = subsample_reads,\n\t\t\t\tpaired_end = ctl_pairedend,\n\t\t\t\tmito_chr_name = mito_chrname,\n\n\t\t\t\tcpu = bam2ta_cpu,\n\t\t\t\tmem_mb = bam2ta_mem_mb,\n\t\t\t\ttime_hr = bam2ta_time_hr,\n\t\t\t\tdisks = bam2ta_disks,\n\t\t\t}\n\t\t}\n\t\tFile? ctlta = if has_output_of_bam2ta_ctl then ctl_tas[i] else bam2ta_ctl.ta\n\t}\n\n\t# if there are TAs for ALL replicates then pool them\n\tBoolean has_all_inputs_of_pool_ta = length(selectall(ta))==num_rep\n\tif ( has_all_inputs_of_pool_ta && num_rep>1 ) {\n\t\t# pool tagaligns from true replicates\n\t\tcall poolta { input :\n\t\t\ttas = ta,\n\t\t\tprefix = 'rep',\n\t\t}\n\t}\n\n\t# if there are pr1 TAs for ALL replicates then pool them\n\tBoolean has_all_inputs_of_pool_ta_pr1 = length(select_all(spr.ta_pr1))==num_rep\n\tif ( has_all_inputs_of_pool_ta_pr1 && num_rep>1 && !align_only && !true_rep_only ) {\n\t\t# pool tagaligns from pseudo replicate 1\n\t\tcall pool_ta as pool_ta_pr1 { input :\n\t\t\ttas = spr.ta_pr1,\n\t\t\tprefix = 'rep-pr1',\n\t\t}\n\t}\n\n\t# if there are pr2 TAs for ALL replicates then pool them\n\tBoolean has_all_inputs_of_pool_ta_pr2 = length(select_all(spr.ta_pr2))==num_rep\n\tif ( has_all_inputs_of_pool_ta_pr1 && num_rep>1 && !align_only && !true_rep_only ) {\n\t\t# pool tagaligns from pseudo replicate 2\n\t\tcall pool_ta as pool_ta_pr2 { input :\n\t\t\ttas = spr.ta_pr2,\n\t\t\tprefix = 'rep-pr2',\n\t\t}\n\t}\n\n\t# if there are CTL TAs for ALL replicates then pool them\n\tBoolean has_all_inputs_of_pool_ta_ctl = length(select_all(ctlta))==num_ctl\n\tif ( has_all_inputs_of_pool_ta_ctl && num_ctl>1 ) {\n\t\t# pool tagaligns from true replicates\n\t\tcall pool_ta as pool_ta_ctl { input :\n\t\t\ttas = ctlta,\n\t\t\tprefix = 'ctl',\n\t\t}\n\t}\n\n\tBoolean has_input_of_count_signal_track_pooled = defined(pool_ta.ta_pooled)\n\tif ( has_input_of_count_signal_track_pooled && enable_count_signal_track && num_rep>1 ) {\n\t\tcall count_signal_track as count_signal_track_pooled { input :\n\t\t\tta = pool_ta.tapooled,\n\t\t\tchrsz = chrsz,\n\t\t}\n\t}\n\n\tBoolean has_input_ofjsd = defined(blacklist) && length(select_all(nodupbam))==num_rep\n\tif ( has_input_of_jsd && num_rep > 0 && enable_jsd ) {\n\t\t# fingerprint and JS-distance plot\n\t\tcall jsd { input :\n\t\t\tnodup_bams = nodupbam,\n\t\t\tctl_bams = ctl_nodupbam, # use first control only\n\t\t\tblacklist = blacklist_,\n\t\t\tmapq_thresh = mapqthresh,\n\n\t\t\tcpu = jsd_cpu,\n\t\t\tmem_mb = jsd_mem_mb,\n\t\t\ttime_hr = jsd_time_hr,\n\t\t\tdisks = jsd_disks,\n\t\t}\n\t}\n\n\tBoolean has_all_input_of_choose_ctl = length(selectall(ta))==num_rep\n\t\t&& length(select_all(ctlta))==num_ctl && num_ctl > 0\n\tif ( has_all_input_of_choose_ctl && !align_only ) {\n\t\t# choose appropriate control for each exp IP replicate\n\t\t# outputs:\n\t\t# \tchoose_ctl.idx : control replicate index for each exp replicate \n\t\t#\t\t\t\t\t-1 means pooled ctl replicate\n\t\tcall choosectl { input:\n\t\t\ttas = ta,\n\t\t\tctl_tas = ctlta,\n\t\t\tta_pooled = pool_ta.ta_pooled,\n\t\t\tctl_ta_pooled = pool_ta_ctl.ta_pooled,\n\t\t\talways_use_pooled_ctl = always_use_pooled_ctl,\n\t\t\tctl_depth_ratio = ctl_depth_ratio,\n\t\t}\n\t}\n\n\t# make control ta array [[1,2,3,4]] -> [[1],[2],[3],[4]], will be zipped with exp ta array latter\n\tArray[Array[File]] chosen_ctl_tas =\n\t\tif has_all_input_of_choose_ctl then transpose(select_all([choose_ctl.chosen_ctl_tas]))\n\t\telse [[],[],[],[],[],[],[],[],[],[]]\n\n\t# workaround for dx error (Unsupported combination: womType: Int womValue: ([225], Array[Int]))\n\tArray[Int] fraglen_tmp = selectall(fraglen)\n\n\t# we have all tas and ctl_tas (optional for histone chipseq) ready, let's call peaks\n\tscatter(i in range(num_rep)) {\n\t\tBoolean has_input_of_callpeak = defined(ta[i])\n\t\tBoolean has_output_of_call_peak = i<length(peaks) && defined(peaks[i])\n\t\tif ( has_input_of_call_peak && !has_output_of_call_peak && !align_only ) {\n\t\t\tcall call_peak { input :\n\t\t\t\tpeak_caller = peakcaller,\n\t\t\t\tpeak_type = peaktype,\n\t\t\t\tcustom_call_peak_py = custom_call_peakpy,\n\t\t\t\ttas = flatten([[ta[i]], chosen_ctltas[i]]),\n\t\t\t\tgensz = gensz,\n\t\t\t\tchrsz = chrsz_,\n\t\t\t\tcap_num_peak = cap_numpeak,\n\t\t\t\tpval_thresh = pval_thresh,\n\t\t\t\tfdr_thresh = fdr_thresh,\n\t\t\t\tfraglen = fraglentmp[i],\n\t\t\t\tblacklist = blacklist,\n\t\t\t\tregex_bfilt_peak_chr_name = regex_bfilt_peak_chrname,\n\n\t\t\t\tcpu = call_peak_cpu,\n\t\t\t\tmem_mb = call_peak_mem_mb,\n\t\t\t\tdisks = call_peak_disks,\n\t\t\t\ttime_hr = call_peak_timehr,\n\t\t\t}\n\t\t}\n\t\tFile? peak = if has_output_of_call_peak then peaks[i]\n\t\t\telse call_peak.peak\n\n\t\t# signal track\n\t\tif ( has_input_of_call_peak && !align_only ) {\n\t\t\tcall macs2_signaltrack { input :\n\t\t\t\ttas = flatten([[ta[i]], chosen_ctltas[i]]),\n\t\t\t\tgensz = gensz,\n\t\t\t\tchrsz = chrsz_,\n\t\t\t\tpval_thresh = pval_thresh,\n\t\t\t\tfraglen = fraglen_tmp[i],\n\n\t\t\t\tmem_mb = macs2_signal_track_mem_mb,\n\t\t\t\tdisks = macs2_signal_track_disks,\n\t\t\t\ttime_hr = macs2_signal_track_time_hr,\n\t\t\t}\n\t\t}\n\n\t\t# call peaks on 1st pseudo replicated tagalign\n\t\tBoolean has_input_of_call_peak_pr1 = defined(spr.ta_pr1[i])\n\t\tBoolean has_output_of_call_peak_pr1 = i<length(peaks_pr1) && defined(peaks_pr1[i])\n\t\tif ( has_input_of_call_peak_pr1 && !has_output_of_call_peak_pr1 && !true_rep_only ) {\n\t\t\tcall call_peak as call_peak_pr1 { input :\n\t\t\t\tpeak_caller = peakcaller,\n\t\t\t\tpeak_type = peaktype,\n\t\t\t\tcustom_call_peak_py = custom_call_peak_py,\n\t\t\t\ttas = flatten([[spr.ta_pr1[i]], chosen_ctltas[i]]),\n\t\t\t\tgensz = gensz,\n\t\t\t\tchrsz = chrsz_,\n\t\t\t\tcap_num_peak = cap_numpeak,\n\t\t\t\tpval_thresh = pval_thresh,\n\t\t\t\tfdr_thresh = fdr_thresh,\n\t\t\t\tfraglen = fraglentmp[i],\n\t\t\t\tblacklist = blacklist,\n\t\t\t\tregex_bfilt_peak_chr_name = regex_bfilt_peak_chrname,\n\t\n\t\t\t\tcpu = call_peak_cpu,\n\t\t\t\tmem_mb = call_peak_mem_mb,\n\t\t\t\tdisks = call_peak_disks,\n\t\t\t\ttime_hr = call_peak_time_hr,\n\t\t\t}\n\t\t}\n\t\tFile? peakpr1 = if has_output_of_call_peak_pr1 then peaks_pr1[i]\n\t\t\telse call_peak_pr1.peak\n\n\t\t# call peaks on 2nd pseudo replicated tagalign\n\t\tBoolean has_input_of_call_peak_pr2 = defined(spr.ta_pr2[i])\n\t\tBoolean has_output_of_call_peak_pr2 = i<length(peaks_pr2) && defined(peaks_pr2[i])\n\t\tif ( has_input_of_call_peak_pr2 && !has_output_of_call_peak_pr2 && !true_rep_only ) {\n\t\t\tcall call_peak as call_peak_pr2 { input :\n\t\t\t\tpeak_caller = peakcaller,\n\t\t\t\tpeak_type = peaktype,\n\t\t\t\tcustom_call_peak_py = custom_call_peak_py,\n\t\t\t\ttas = flatten([[spr.ta_pr2[i]], chosen_ctltas[i]]),\n\t\t\t\tgensz = gensz,\n\t\t\t\tchrsz = chrsz_,\n\t\t\t\tcap_num_peak = cap_numpeak,\n\t\t\t\tpval_thresh = pval_thresh,\n\t\t\t\tfdr_thresh = fdr_thresh,\n\t\t\t\tfraglen = fraglentmp[i],\n\t\t\t\tblacklist = blacklist,\n\t\t\t\tregex_bfilt_peak_chr_name = regex_bfilt_peak_chrname,\n\n\t\t\t\tcpu = call_peak_cpu,\n\t\t\t\tmem_mb = call_peak_mem_mb,\n\t\t\t\tdisks = call_peak_disks,\n\t\t\t\ttime_hr = call_peak_time_hr,\n\t\t\t}\n\t\t}\n\t\tFile? peakpr2 = if has_output_of_call_peak_pr2 then peaks_pr2[i]\n\t\t\telse call_peak_pr2.peak\n\t}\n\n\t# if ( !align_only && num_rep > 1 ) {\n\t# rounded mean of fragment length, which will be used for \n\t# 1) calling peaks for pooled true/pseudo replicates\n\t# 2) calculating FRiP\n\tcall rounded_mean as fraglen_mean { input :\n\t\tints = fraglen_tmp,\n\t}\n\t# }\n\n\t# actually not an array\n\tArray[File?] chosen_ctl_ta_pooled = if !has_all_input_of_choose_ctl then []\n\t\telse if num_ctl < 2 then [ctlta[0]] # choose first (only) control\n\t\telse select_all([pool_ta_ctl.ta_pooled]) # choose pooled control\n\n\tBoolean has_input_of_call_peak_pooled = defined(pool_ta.ta_pooled)\n\tBoolean has_output_of_call_peak_pooled = defined(peak_pooled)\n\tif ( has_input_of_call_peak_pooled && !has_output_of_call_peak_pooled && !align_only && num_rep>1 ) {\n\t\t# call peaks on pooled replicate\n\t\t# always call peaks for pooled replicate to get signal tracks\n\t\tcall call_peak as call_peak_pooled { input :\n\t\t\tpeak_caller = peakcaller,\n\t\t\tpeak_type = peaktype,\n\t\t\tcustom_call_peak_py = custom_call_peak_py,\n\t\t\ttas = flatten([select_all([pool_ta.ta_pooled]), chosen_ctl_tapooled]),\n\t\t\tgensz = gensz,\n\t\t\tchrsz = chrsz_,\n\t\t\tcap_num_peak = cap_numpeak,\n\t\t\tpval_thresh = pval_thresh,\n\t\t\tfdr_thresh = fdr_thresh,\n\t\t\tfraglen = fraglen_mean.roundedmean,\n\t\t\tblacklist = blacklist,\n\t\t\tregex_bfilt_peak_chr_name = regex_bfilt_peak_chrname,\n\n\t\t\tcpu = call_peak_cpu,\n\t\t\tmem_mb = call_peak_mem_mb,\n\t\t\tdisks = call_peak_disks,\n\t\t\ttime_hr = call_peak_time_hr,\n\t\t}\n\t}\n\tFile? peakpooled = if has_output_of_call_peak_pooled then peak_pooled\n\t\telse call_peak_pooled.peak\t\n\n\t# macs2 signal track for pooled rep\n\tif ( has_input_of_call_peak_pooled && !align_only && num_rep>1 ) {\n\t\tcall macs2_signal_track as macs2_signal_track_pooled { input :\n\t\t\ttas = flatten([select_all([pool_ta.ta_pooled]), chosen_ctl_tapooled]),\n\t\t\tgensz = gensz,\n\t\t\tchrsz = chrsz_,\n\t\t\tpval_thresh = pval_thresh,\n\t\t\tfraglen = fraglen_mean.rounded_mean,\n\n\t\t\tmem_mb = macs2_signal_track_mem_mb,\n\t\t\tdisks = macs2_signal_track_disks,\n\t\t\ttime_hr = macs2_signal_track_time_hr,\n\t\t}\n\t}\n\n\tBoolean has_input_of_call_peak_ppr1 = defined(pool_ta_pr1.ta_pooled)\n\tBoolean has_output_of_call_peak_ppr1 = defined(peak_ppr1)\n\tif ( has_input_of_call_peak_ppr1 && !has_output_of_call_peak_ppr1 && !align_only && !true_rep_only && num_rep>1 ) {\n\t\t# call peaks on 1st pooled pseudo replicates\n\t\tcall call_peak as call_peak_ppr1 { input :\n\t\t\tpeak_caller = peakcaller,\n\t\t\tpeak_type = peaktype,\n\t\t\tcustom_call_peak_py = custom_call_peak_py,\n\t\t\ttas = flatten([select_all([pool_ta_pr1.ta_pooled]), chosen_ctl_tapooled]),\n\t\t\tgensz = gensz,\n\t\t\tchrsz = chrsz_,\n\t\t\tcap_num_peak = cap_numpeak,\n\t\t\tpval_thresh = pval_thresh,\n\t\t\tfdr_thresh = fdr_thresh,\n\t\t\tfraglen = fraglen_mean.roundedmean,\n\t\t\tblacklist = blacklist,\n\t\t\tregex_bfilt_peak_chr_name = regex_bfilt_peak_chrname,\n\n\t\t\tcpu = call_peak_cpu,\n\t\t\tmem_mb = call_peak_mem_mb,\n\t\t\tdisks = call_peak_disks,\n\t\t\ttime_hr = call_peak_time_hr,\n\t\t}\n\t}\n\tFile? peakppr1 = if has_output_of_call_peak_ppr1 then peak_ppr1\n\t\telse call_peak_ppr1.peak\n\n\tBoolean has_input_of_call_peak_ppr2 = defined(pool_ta_pr2.ta_pooled)\n\tBoolean has_output_of_call_peak_ppr2 = defined(peak_ppr2)\n\tif ( has_input_of_call_peak_ppr2 && !has_output_of_call_peak_ppr2 && !align_only && !true_rep_only && num_rep>1 ) {\n\t\t# call peaks on 2nd pooled pseudo replicates\n\t\tcall call_peak as call_peak_ppr2 { input :\n\t\t\tpeak_caller = peakcaller,\n\t\t\tpeak_type = peaktype,\n\t\t\tcustom_call_peak_py = custom_call_peak_py,\n\t\t\ttas = flatten([select_all([pool_ta_pr2.ta_pooled]), chosen_ctl_tapooled]),\n\t\t\tgensz = gensz,\n\t\t\tchrsz = chrsz_,\n\t\t\tcap_num_peak = cap_numpeak,\n\t\t\tpval_thresh = pval_thresh,\n\t\t\tfdr_thresh = fdr_thresh,\n\t\t\tfraglen = fraglen_mean.roundedmean,\n\t\t\tblacklist = blacklist,\n\t\t\tregex_bfilt_peak_chr_name = regex_bfilt_peak_chrname,\n\n\t\t\tcpu = call_peak_cpu,\n\t\t\tmem_mb = call_peak_mem_mb,\n\t\t\tdisks = call_peak_disks,\n\t\t\ttime_hr = call_peak_time_hr,\n\t\t}\n\t}\n\tFile? peakppr2 = if has_output_of_call_peak_ppr2 then peak_ppr2\n\t\telse call_peak_ppr2.peak\n\n\t# do IDR/overlap on all pairs of two replicates (i,j)\n\t# \twhere i and j are zero-based indices and 0 <= i < j < numrep\n\tArray[Pair[Int, Int]] pairs = cross(range(num_rep),range(numrep))\n\tscatter( pair in pairs ) {\n\t\tPair[Int, Int]? null_pair\n\t\tPair[Int, Int]? pairs = if pair.left<pair.right then pair else null_pair\n\t}\n\tArray[Pair[Int, Int]] pairs = select_all(pairs)\n\n\tif ( !align_only ) {\n\t\tscatter( pair in pairs ) {\n\t\t\t# pair.left = 0-based index of 1st replicate\n\t\t\t# pair.right = 0-based index of 2nd replicate\n\t\t\t# Naive overlap on every pair of true replicates\n\t\t\tcall overlap { input :\n\t\t\t\tprefix = 'rep'+(pair.left+1)+'_vsrep'+(pair.right+1),\n\t\t\t\tpeak1 = peak[pair.left],\n\t\t\t\tpeak2 = peak_[pair.right],\n\t\t\t\tpeak_pooled = peakpooled,\n\t\t\t\tfraglen = fraglen_mean.rounded_mean,\n\t\t\t\tpeak_type = peaktype,\n\t\t\t\tblacklist = blacklist,\n\t\t\t\tchrsz = chrsz,\n\t\t\t\tregex_bfilt_peak_chr_name = regex_bfilt_peak_chrname,\n\t\t\t\tta = pool_ta.ta_pooled,\n\t\t\t}\n\t\t}\n\t}\n\n\tif ( enable_idr && !align_only ) {\n\t\tscatter( pair in pairs ) {\n\t\t\t# pair.left = 0-based index of 1st replicate\n\t\t\t# pair.right = 0-based index of 2nd replicate\n\t\t\t# IDR on every pair of true replicates\n\t\t\tcall idr { input :\n\t\t\t\tprefix = 'rep'+(pair.left+1)+'_vsrep'+(pair.right+1),\n\t\t\t\tpeak1 = peak[pair.left],\n\t\t\t\tpeak2 = peak_[pair.right],\n\t\t\t\tpeak_pooled = peakpooled,\n\t\t\t\tfraglen = fraglen_mean.rounded_mean,\n\t\t\t\tidr_thresh = idr_thresh,\n\t\t\t\tpeak_type = peaktype,\n\t\t\t\trank = idrrank,\n\t\t\t\tblacklist = blacklist,\n\t\t\t\tchrsz = chrsz,\n\t\t\t\tregex_bfilt_peak_chr_name = regex_bfilt_peak_chrname,\n\t\t\t\tta = pool_ta.ta_pooled,\n\t\t\t}\n\t\t}\n\t}\n\n\t# overlap on pseudo-replicates (pr1, pr2) for each true replicate\n\tif ( !align_only && !true_rep_only ) {\n\t\tscatter( i in range(num_rep) ) {\n\t\t\tcall overlap as overlap_pr { input :\n\t\t\t\tprefix = 'rep'+(i+1)+'-pr1_vs_rep'+(i+1)+'-pr2',\n\t\t\t\tpeak1 = peakpr1[i],\n\t\t\t\tpeak2 = peakpr2[i],\n\t\t\t\tpeakpooled = peak[i],\n\t\t\t\tfraglen = fraglen_[i],\n\t\t\t\tpeak_type = peaktype,\n\t\t\t\tblacklist = blacklist,\n\t\t\t\tchrsz = chrsz,\n\t\t\t\tregex_bfilt_peak_chr_name = regex_bfilt_peak_chrname,\n\t\t\t\tta = ta_[i],\n\t\t\t}\n\t\t}\n\t}\n\n\tif ( !align_only && !true_rep_only && enable_idr ) {\n\t\tscatter( i in range(num_rep) ) {\n\t\t\t# IDR on pseduo replicates\n\t\t\tcall idr as idr_pr { input :\n\t\t\t\tprefix = 'rep'+(i+1)+'-pr1_vs_rep'+(i+1)+'-pr2',\n\t\t\t\tpeak1 = peakpr1[i],\n\t\t\t\tpeak2 = peakpr2[i],\n\t\t\t\tpeakpooled = peak[i],\n\t\t\t\tfraglen = fraglen_[i],\n\t\t\t\tidr_thresh = idr_thresh,\n\t\t\t\tpeak_type = peaktype,\n\t\t\t\trank = idrrank,\n\t\t\t\tblacklist = blacklist,\n\t\t\t\tchrsz = chrsz,\n\t\t\t\tregex_bfilt_peak_chr_name = regex_bfilt_peak_chrname,\n\t\t\t\tta = ta_[i],\n\t\t\t}\n\t\t}\n\t}\n\n\tif ( !align_only && !true_rep_only && num_rep > 1 ) {\n\t\t# Naive overlap on pooled pseudo replicates\n\t\tcall overlap as overlap_ppr { input :\n\t\t\tprefix = 'pooled-pr1_vs_pooled-pr2',\n\t\t\tpeak1 = peakppr1,\n\t\t\tpeak2 = peakppr2,\n\t\t\tpeak_pooled = peakpooled,\n\t\t\tpeak_type = peaktype,\n\t\t\tfraglen = fraglen_mean.roundedmean,\n\t\t\tblacklist = blacklist,\n\t\t\tchrsz = chrsz_,\n\t\t\tregex_bfilt_peak_chr_name = regex_bfilt_peak_chrname,\n\t\t\tta = pool_ta.ta_pooled,\n\t\t}\n\t}\n\n\tif ( !align_only && !true_rep_only && num_rep > 1 ) {\n\t\t# IDR on pooled pseduo replicates\n\t\tcall idr as idr_ppr { input :\n\t\t\tprefix = 'pooled-pr1_vs_pooled-pr2',\n\t\t\tpeak1 = peakppr1,\n\t\t\tpeak2 = peakppr2,\n\t\t\tpeak_pooled = peakpooled,\n\t\t\tidr_thresh = idr_thresh,\n\t\t\tpeak_type = peaktype,\n\t\t\tfraglen = fraglen_mean.rounded_mean,\n\t\t\trank = idrrank,\n\t\t\tblacklist = blacklist,\n\t\t\tchrsz = chrsz,\n\t\t\tregex_bfilt_peak_chr_name = regex_bfilt_peak_chrname,\n\t\t\tta = pool_ta.ta_pooled,\n\t\t}\n\t}\n\n\t# reproducibility QC for overlap/IDR peaks\n\tif ( !align_only && !true_rep_only && num_rep > 0 ) {\n\t\t# reproducibility QC for overlapping peaks\n\t\tcall reproducibility as reproducibility_overlap { input :\n\t\t\tprefix = 'overlap',\n\t\t\tpeaks = overlap.bfilt_overlap_peak,\n\t\t\tpeaks_pr = overlap_pr.bfilt_overlap_peak,\n\t\t\tpeak_ppr = overlap_ppr.bfilt_overlap_peak,\n\t\t\tpeak_type = peaktype,\n\t\t\tchrsz = chrsz_,\n\t\t}\n\t}\n\n\tif ( !align_only && !true_rep_only && num_rep > 0 && enable_idr ) {\n\t\t# reproducibility QC for IDR peaks\n\t\tcall reproducibility as reproducibility_idr { input :\n\t\t\tprefix = 'idr',\n\t\t\tpeaks = idr.bfilt_idr_peak,\n\t\t\tpeaks_pr = idr_pr.bfilt_idr_peak,\n\t\t\tpeak_ppr = idr_ppr.bfilt_idr_peak,\n\t\t\tpeak_type = peaktype,\n\t\t\tchrsz = chrsz_,\n\t\t}\n\t}\n\n\t# Generate final QC report and JSON\n\tcall qc_report { input :\n\t\tpipeline_ver = pipeline_ver,\n\t\ttitle = title,\n\t\tdescription = description,\n\t\tgenome = genomename,\n\t\tpaired_ends = pairedend,\n\t\tctl_paired_ends = ctl_pairedend,\n\t\tpipeline_type = pipelinetype,\n\t\taligner = aligner,\n\t\tpeak_caller = peakcaller,\n\t\tcap_num_peak = cap_numpeak,\n\t\tidr_thresh = idr_thresh,\n\t\tpval_thresh = pval_thresh,\n\t\txcor_trim_bp = xcor_trim_bp,\n\t\txcor_subsample_reads = xcor_subsample_reads,\n\n\t\tsamstat_qcs = align.samstat_qc,\n\t\tnodup_samstat_qcs = filter.samstat_qc,\n\t\tdup_qcs = filter.dup_qc,\n\t\tlib_complexity_qcs = filter.lib_complexity_qc,\n\t\txcor_plots = xcor.plot_png,\n\t\txcor_scores = xcor.score,\n\n\t\tctl_samstat_qcs = align_ctl.samstat_qc,\n\t\tctl_nodup_samstat_qcs = filter_ctl.samstat_qc,\n\t\tctl_dup_qcs = filter_ctl.dup_qc,\n\t\tctl_lib_complexity_qcs = filter_ctl.lib_complexity_qc,\n\n\t\tjsd_plot = jsd.plot,\n\t\tjsd_qcs = jsd.jsd_qcs,\n\n\t\tfrip_qcs = call_peak.frip_qc,\n\t\tfrip_qcs_pr1 = call_peak_pr1.frip_qc,\n\t\tfrip_qcs_pr2 = call_peak_pr2.frip_qc,\n\t\tfrip_qc_pooled = call_peak_pooled.frip_qc,\n\t\tfrip_qc_ppr1 = call_peak_ppr1.frip_qc,\n\t\tfrip_qc_ppr2 = call_peak_ppr2.frip_qc,\n\n\t\tidr_plots = idr.idr_plot,\n\t\tidr_plots_pr = idr_pr.idr_plot,\n\t\tidr_plot_ppr = idr_ppr.idr_plot,\n\t\tfrip_idr_qcs = idr.frip_qc,\n\t\tfrip_idr_qcs_pr = idr_pr.frip_qc,\n\t\tfrip_idr_qc_ppr = idr_ppr.frip_qc,\n\t\tfrip_overlap_qcs = overlap.frip_qc,\n\t\tfrip_overlap_qcs_pr = overlap_pr.frip_qc,\n\t\tfrip_overlap_qc_ppr = overlap_ppr.frip_qc,\n\t\tidr_reproducibility_qc = reproducibility_idr.reproducibility_qc,\n\t\toverlap_reproducibility_qc = reproducibility_overlap.reproducibility_qc,\n\n\t\tgc_plots = gc_bias.gc_plot,\n\n\t\tpeak_region_size_qcs = call_peak.peak_region_size_qc,\n\t\tpeak_region_size_plots = call_peak.peak_region_size_plot,\n\t\tnum_peak_qcs = call_peak.num_peak_qc,\n\n\t\tidr_opt_peak_region_size_qc = reproducibility_idr.peak_region_size_qc,\n\t\tidr_opt_peak_region_size_plot = reproducibility_overlap.peak_region_size_plot,\n\t\tidr_opt_num_peak_qc = reproducibility_idr.num_peak_qc,\n\n\t\toverlap_opt_peak_region_size_qc = reproducibility_overlap.peak_region_size_qc,\n\t\toverlap_opt_peak_region_size_plot = reproducibility_overlap.peak_region_size_plot,\n\t\toverlap_opt_num_peak_qc = reproducibility_overlap.num_peak_qc,\n\t}\n\n\toutput {\n\t\tFile report = qc_report.report\n\t\tFile qc_json = qc_report.qc_json\n\t\tBoolean qc_json_ref_match = qc_report.qc_json_ref_match\n\t}\n}\n\ntask align {\n\tArray[File] fastqs_R1 \t\t# [merge_id]\n\tArray[File] fastqs_R2\n\tInt? trim_bp\t\t\t# this is for R1 only\n\tInt crop_length\n\tString aligner\n\tString mito_chr_name\n\tInt? multimapping\n\tFile? custom_align_py\t\n\tFile? idx_tar\t\t\t# reference index tar\n\tBoolean paired_end\n\tBoolean use_bwa_mem_for_pe\n\n\tString? trimmomatic_java_heap\n\tInt cpu\n\tInt mem_mb\n\tInt time_hr\n\tString disks\n\n\tArray[Array[File]] tmp_fastqs = if paired_end then transpose([fastqs_R1, fastqs_R2])\n\t\t\t\telse transpose([fastqs_R1])\n\tcommand {\n\t\tset -e\n\n\t\t# check if pipeline dependencies can be found\n\t\tif [[ -z \"$(which encode_task_merge_fastq.py 2> /dev/null || true)\" ]]\n\t\tthen\n\t\t echo -e \"\n Error: pipeline dependencies not found.\" 1>&2\n\t\t echo 'Conda users: Did you activate Conda environment (conda activate encode-chip-seq-pipeline)?' 1>&2\n\t\t echo ' Or did you install Conda and environment correctly (bash scripts/install_conda_env.sh)?' 1>&2\n\t\t echo 'GCP/AWS/Docker users: Did you add --docker flag to Caper command line arg?' 1>&2\n\t\t echo 'Singularity users: Did you add --singularity flag to Caper command line arg?' 1>&2\n\t\t echo -e \"\n\" 1>&2\n\t\t exit 3\n\t\tfi\n\t\tpython3 $(which encode_task_merge_fastq.py) \\n\t\t\t${write_tsv(tmp_fastqs)} \\n\t\t\t${if paired_end then '--paired-end' else ''} \\n\t\t\t${'--nth ' + cpu}\n\n\t\tif [ -z '${trim_bp}' ]; then\n\t\t\tSUFFIX=\n\t\telse\n\t\t\tSUFFIX=_trimmed\n\t\t\tpython3 $(which encode_task_trim_fastq.py) \\n\t\t\t\tR1/.fastq.gz \\n\t\t\t\t--trim-bp ${trim_bp} \\n\t\t\t\t--out-dir R1$SUFFIX\n\t\t\tif [ '${paired_end}' == 'true' ]; then\n\t\t\t\tpython3 $(which encode_task_trim_fastq.py) \\n\t\t\t\t\tR2/.fastq.gz \\n\t\t\t\t\t--trim-bp ${trim_bp} \\n\t\t\t\t\t--out-dir R2$SUFFIX\n\t\t\tfi\n\t\tfi\n\t\tif [ '${crop_length}' == '0' ]; then\n\t\t\tSUFFIX=$SUFFIX\n\t\telse\n\t\t\tNEW_SUFFIX=\"$SUFFIX\"_cropped\n\t\t\tpython3 $(which encode_task_trimmomatic.py) \\n\t\t\t\t--fastq1 R1$SUFFIX/.fastq.gz \\n\t\t\t\t${if paired_end then '--fastq2 R2$SUFFIX/.fastq.gz' else ''} \\n\t\t\t\t${if paired_end then '--paired-end' else ''} \\n\t\t\t\t--crop-length ${crop_length} \\n\t\t\t\t--out-dir-R1 R1$NEW_SUFFIX \\n\t\t\t\t${if paired_end then '--out-dir-R2 R2$NEW_SUFFIX' else ''} \\n\t\t\t\t${'--trimmomatic-java-heap ' + if defined(trimmomatic_java_heap) then trimmomatic_java_heap else (mem_mb + 'M')} \\n\t\t\t\t${'--nth ' + cpu}\n\t\t\tSUFFIX=$NEW_SUFFIX\n\t\tfi\n\n\t\tif [ '${aligner}' == 'bwa' ]; then\n\t\t \tpython3 $(which encode_task_bwa.py) \\n\t\t\t\t${idx_tar} \\n\t\t\t\tR1$SUFFIX/.fastq.gz \\n\t\t\t\t${if paired_end then 'R2$SUFFIX/.fastq.gz' else ''} \\n\t\t\t\t${if paired_end then '--paired-end' else ''} \\n\t\t\t\t${if use_bwa_mem_for_pe then '--use-bwa-mem-for-pe' else ''} \\n\t\t\t\t${'--nth ' + cpu}\n\n\t\telif [ '${aligner}' == 'bowtie2' ]; then\n\t\t \tpython3 $(which encode_task_bowtie2.py) \\n\t\t\t\t${idx_tar} \\n\t\t\t\tR1$SUFFIX/.fastq.gz \\n\t\t\t\t${if paired_end then 'R2$SUFFIX/.fastq.gz' else ''} \\n\t\t\t\t${'--multimapping ' + multimapping} \\n\t\t\t\t${if paired_end then '--paired-end' else ''} \\n\t\t\t\t${'--nth ' + cpu}\n\t\telse\n\t\t\tpython3 ${custom_align_py} \\n\t\t\t\t${idx_tar} \\n\t\t\t\tR1$SUFFIX/.fastq.gz \\n\t\t\t\t${if paired_end then 'R2$SUFFIX/.fastq.gz' else ''} \\n\t\t\t\t${if paired_end then '--paired-end' else ''} \\n\t\t\t\t${'--nth ' + cpu}\n\t\tfi \n\n\t\tpython3 $(which encode_task_post_align.py) \\n\t\t\tR1$SUFFIX/.fastq.gz $(ls .bam) \\n\t\t\t${'--mito-chr-name ' + mito_chr_name} \\n\t\t\t${'--nth ' + cpu}\n\t\trm -rf R1 R2 R1$SUFFIX R2$SUFFIX\n\t}\n\toutput {\n\t\tFile bam = glob('.bam')[0]\n\t\tFile bai = glob('.bai')[0]\n\t\tFile samstat_qc = glob('.samstats.qc')[0]\n\t\tFile read_len_log = glob('.read_length.txt')[0]\n\t}\n\truntime {\n\t\tcpu : cpu\n\t\tmemory : '${mem_mb} MB'\n\t\ttime : time_hr\n\t\tdisks : disks\n\t\tpreemptible: 0\n\t}\n}\n\ntask filter {\n\tFile bam\n\tBoolean paired_end\n\tString dup_marker \t\t\t# picard.jar MarkDuplicates (picard) or \n\t\t\t\t\t\t\t\t# sambamba markdup (sambamba)\n\tInt mapq_thresh\t\t\t\t# threshold for low MAPQ reads removal\n\tArray[String] filter_chrs \t# chrs to be removed from final (nodup/filt) BAM\n\tFile chrsz\t\t\t\t\t# 2-col chromosome sizes file\n\tBoolean no_dup_removal \t\t# no dupe reads removal when filtering BAM\n\tString mito_chr_name\n\n\tInt cpu\n\tInt mem_mb\n\tString? picard_java_heap\n\tInt time_hr\n\tString disks\n\n\tcommand {\n\t\tpython3 $(which encode_task_filter.py) \\n\t\t\t${bam} \\n\t\t\t${if paired_end then '--paired-end' else ''} \\n\t\t\t--multimapping 0 \\n\t\t\t${'--dup-marker ' + dup_marker} \\n\t\t\t${'--mapq-thresh ' + mapq_thresh} \\n\t\t\t--filter-chrs ${sep=' ' filter_chrs} \\n\t\t\t${'--chrsz ' + chrsz} \\n\t\t\t${if no_dup_removal then '--no-dup-removal' else ''} \\n\t\t\t${'--mito-chr-name ' + mito_chr_name} \\n\t\t\t${'--nth ' + cpu} \\n\t\t\t${'--picard-java-heap ' + if defined(picard_java_heap) then picard_java_heap else (mem_mb + 'M')}\n\t}\n\toutput {\n\t\tFile nodup_bam = glob('.bam')[0]\n\t\tFile nodup_bai = glob('.bai')[0]\n\t\tFile samstat_qc = glob('.samstats.qc')[0]\n\t\tFile dup_qc = glob('.dup.qc')[0]\n\t\tFile lib_complexity_qc = glob('.lib_complexity.qc')[0]\n\t}\n\truntime {\n\t\tcpu : cpu\n\t\tmemory : '${mem_mb} MB'\n\t\ttime : time_hr\n\t\tdisks : disks\n\t}\n}\n\ntask bam2ta {\n\tFile bam\n\tBoolean paired_end\n\tString mito_chr_name \t\t# mito chromosome name\n\tInt subsample \t\t\t\t# number of reads to subsample TAGALIGN\n\t\t\t\t\t\t\t\t# this affects all downstream analysis\n\tInt cpu\n\tInt mem_mb\n\tInt time_hr\n\tString disks\n\n\tcommand {\n\t\tpython3 $(which encode_task_bam2ta.py) \\n\t\t\t${bam} \\n\t\t\t--disable-tn5-shift \\n\t\t\t${if paired_end then '--paired-end' else ''} \\n\t\t\t${'--mito-chr-name ' + mito_chr_name} \\n\t\t\t${'--subsample ' + subsample} \\n\t\t\t${'--nth ' + cpu}\n\t}\n\toutput {\n\t\tFile ta = glob('.tagAlign.gz')[0]\n\t}\n\truntime {\n\t\tcpu : cpu\n\t\tmemory : '${mem_mb} MB'\n\t\ttime : time_hr\n\t\tdisks : disks\n\t}\n}\n\ntask spr { # make two self pseudo replicates\n\tFile ta\n\tBoolean paired_end\n\n\tInt mem_mb\n\n\tcommand {\n\t\tpython3 $(which encode_task_spr.py) \\n\t\t\t${ta} \\n\t\t\t${if paired_end then '--paired-end' else ''}\n\t}\n\toutput {\n\t\tFile ta_pr1 = glob('.pr1.tagAlign.gz')[0]\n\t\tFile ta_pr2 = glob('.pr2.tagAlign.gz')[0]\n\t}\n\truntime {\n\t\tcpu : 1\n\t\tmemory : '${mem_mb} MB'\n\t\ttime : 1\n\t\tdisks : 'local-disk 50 HDD'\n\t}\n}\n\ntask pool_ta {\n\tArray[File?] tas\n\tInt? col \t\t\t# number of columns in pooled TA\n\tString? prefix \t\t# basename prefix\n\n\tcommand {\n\t\tpython3 $(which encode_task_pool_ta.py) \\n\t\t\t${sep=' ' tas} \\n\t\t\t${'--prefix ' + prefix} \\n\t\t\t${'--col ' + col}\n\t}\n\toutput {\n\t\tFile ta_pooled = glob('.tagAlign.gz')[0]\n\t}\n\truntime {\n\t\tcpu : 1\n\t\tmemory : '4000 MB'\n\t\ttime : 1\n\t\tdisks : 'local-disk 50 HDD'\n\t}\n}\n\ntask xcor {\n\tFile ta\n\tBoolean paired_end\n\tString mito_chr_name\n\tInt subsample # number of reads to subsample TAGALIGN\n\t\t\t\t\t# this will be used for xcor only\n\t\t\t\t\t# will not affect any downstream analysis\n\tString? chip_seq_type\n\tInt? exclusion_range_min\n\tInt? exclusion_range_max\n\n\tInt cpu\n\tInt mem_mb\t\n\tInt time_hr\n\tString disks\n\n\tcommand {\n\t\tpython3 $(which encode_task_xcor.py) \\n\t\t\t${ta} \\n\t\t\t${if paired_end then '--paired-end' else ''} \\n\t\t\t${'--mito-chr-name ' + mito_chr_name} \\n\t\t\t${'--subsample ' + subsample} \\n\t\t\t${'--chip-seq-type ' + chip_seq_type} \\n\t\t\t${'--exclusion-range-min ' + exclusion_range_min} \\n\t\t\t${'--exclusion-range-max ' + exclusion_range_max} \\n\t\t\t${'--subsample ' + subsample} \\n\t\t\t${'--nth ' + cpu}\n\t}\n\toutput {\n\t\tFile plot_pdf = glob('.cc.plot.pdf')[0]\n\t\tFile plot_png = glob('.cc.plot.png')[0]\n\t\tFile score = glob('.cc.qc')[0]\n\t\tInt fraglen = read_int(glob('.cc.fraglen.txt')[0])\n\t}\n\truntime {\n\t\tcpu : cpu\n\t\tmemory : '${mem_mb} MB'\n\t\ttime : time_hr\n\t\tdisks : disks\n\t}\n}\n\ntask jsd {\n\tArray[File?] nodup_bams\n\tArray[File?] ctl_bams\n\tFile blacklist\n\tInt mapq_thresh\n\n\tInt cpu\n\tInt mem_mb\n\tInt time_hr\n\tString disks\n\n\tcommand {\n\t\tpython3 $(which encode_task_jsd.py) \\n\t\t\t${sep=' ' nodup_bams} \\n\t\t\t${if length(ctl_bams)>0 then '--ctl-bam '+ select_first(ctl_bams) else ''} \\n\t\t\t${'--mapq-thresh '+ mapq_thresh} \\n\t\t\t${'--blacklist '+ blacklist} \\n\t\t\t${'--nth ' + cpu}\n\t}\n\toutput {\n\t\tFile plot = glob('.png')[0]\n\t\tArray[File] jsd_qcs = glob('.jsd.qc')\n\t}\n\truntime {\n\t\tcpu : cpu\n\t\tmemory : '${mem_mb} MB'\n\t\ttime : time_hr\n\t\tdisks : disks\n\t}\n}\n\ntask choose_ctl {\n\tArray[File?] tas\n\tArray[File?] ctl_tas\n\tFile? ta_pooled\n\tFile? ctl_ta_pooled\n\tBoolean always_use_pooled_ctl # always use pooled control for all exp rep.\n\tFloat ctl_depth_ratio \t\t# if ratio between controls is higher than this\n\t\t\t\t\t\t\t\t# then always use pooled control for all exp rep.\n\tcommand {\n\t\tpython3 $(which encode_task_choose_ctl.py) \\n\t\t\t--tas ${sep=' ' tas} \\n\t\t\t--ctl-tas ${sep=' ' ctl_tas} \\n\t\t\t${'--ta-pooled ' + ta_pooled} \\n\t\t\t${'--ctl-ta-pooled ' + ctl_ta_pooled} \\n\t\t\t${if always_use_pooled_ctl then '--always-use-pooled-ctl' else ''} \\n\t\t\t${'--ctl-depth-ratio ' + ctl_depth_ratio}\n\t}\n\toutput {\n\t\tArray[File] chosen_ctl_tas = glob('ctl_for_rep.tagAlign.gz')\n\t}\n\truntime {\n\t\tcpu : 1\n\t\tmemory : '8000 MB'\n\t\ttime : 1\n\t\tdisks : 'local-disk 50 HDD'\n\t}\t\n}\n\ntask count_signal_track {\n\tFile ta \t\t\t# tag-align\n\tFile chrsz\t\t\t# 2-col chromosome sizes file\n\n\tcommand {\n\t\tpython3 $(which encode_task_count_signal_track.py) \\n\t\t\t${ta} \\n\t\t\t${'--chrsz ' + chrsz}\n\t}\n\toutput {\n\t\tFile pos_bw = glob('.positive.bigwig')[0]\n\t\tFile neg_bw = glob('.negative.bigwig')[0]\n\t}\n\truntime {\n\t\tcpu : 1\n\t\tmemory : '8000 MB'\n\t\ttime : 4\n\t\tdisks : 'local-disk 50 HDD'\n\t}\n}\n\ntask call_peak {\n\tString peak_caller\n\tString peak_type\n\tFile? custom_call_peak_py\n\n\tArray[File?] tas\t# [ta, control_ta]. control_ta is optional\n\tInt fraglen \t\t# fragment length from xcor\n\tString gensz\t\t# Genome size (sum of entries in 2nd column of \n # chr. sizes file, or hs for human, ms for mouse)\n\tFile chrsz\t\t\t# 2-col chromosome sizes file\n\tInt cap_num_peak\t# cap number of raw peaks called from MACS2\n\tFloat pval_thresh \t# p.value threshold for MACS2\n\tFloat? fdr_thresh \t# FDR threshold for SPP\n\n\tFile? blacklist \t# blacklist BED to filter raw peaks\n\tString? regex_bfilt_peak_chr_name\n\n\tInt cpu\t\n\tInt mem_mb\n\tInt time_hr\n\tString disks\n\n\tcommand {\n\t\tset -e\n\n\t\tif [ '${peak_caller}' == 'macs2' ]; then\n\t\t\tpython3 $(which encode_task_macs2_chip.py) \\n\t\t\t\t${sep=' ' tas} \\n\t\t\t\t${'--gensz '+ gensz} \\n\t\t\t\t${'--chrsz ' + chrsz} \\n\t\t\t\t${'--fraglen ' + fraglen} \\n\t\t\t\t${'--cap-num-peak ' + cap_num_peak} \\n\t\t\t\t${'--pval-thresh '+ pval_thresh}\n\n\t\telif [ '${peak_caller}' == 'spp' ]; then\n\t\t\tpython3 $(which encode_task_spp.py) \\n\t\t\t\t${sep=' ' tas} \\n\t\t\t\t${'--fraglen ' + fraglen} \\n\t\t\t\t${'--cap-num-peak ' + cap_num_peak} \\n\t\t\t\t${'--fdr-thresh '+ fdr_thresh} \\n\t\t\t\t${'--nth ' + cpu}\n\n\t\telse\n\t\t\tpython3 ${custom_call_peak_py} \\n\t\t\t\t${sep=' ' tas} \\n\t\t\t\t${'--gensz '+ gensz} \\n\t\t\t\t${'--chrsz ' + chrsz} \\n\t\t\t\t${'--fraglen ' + fraglen} \\n\t\t\t\t${'--cap-num-peak ' + cap_num_peak} \\n\t\t\t\t${'--pval-thresh '+ pval_thresh}\n\t\t\t\t${'--fdr-thresh '+ fdr_thresh}\n\t\t\t\t${'--nth ' + cpu}\n\t\tfi\n\n\t\tpython3 $(which encode_task_post_call_peak_chip.py) \\n\t\t\t$(ls Peak.gz) \\n\t\t\t${'--ta ' + tas[0]} \\n\t\t\t${'--regex-bfilt-peak-chr-name \'' + regex_bfilt_peak_chr_name + '\''} \\n\t\t\t${'--chrsz ' + chrsz} \\n\t\t\t${'--fraglen ' + fraglen} \\n\t\t\t${'--peak-type ' + peak_type} \\n\t\t\t${'--blacklist ' + blacklist}\t\t\n\t}\n\toutput {\n\t\t# generated by custom_call_peak_py\n\t\tFile peak = glob('[!.][!b][!f][!i][!l][!t].'+peak_type+'.gz')[0]\n\t\t# generated by post_call_peak py\n\t\tFile bfilt_peak = glob('.bfilt.'+peak_type+'.gz')[0]\n\t\tFile bfilt_peak_bb = glob('.bfilt.'+peak_type+'.bb')[0]\n\t\tFile bfilt_peak_hammock = glob('.bfilt.'+peak_type+'.hammock.gz')[0]\n\t\tFile bfilt_peak_hammock_tbi = glob('.bfilt.'+peak_type+'.hammock.gz')[1]\n\t\tFile frip_qc = glob('.frip.qc')[0]\n\t\tFile peak_region_size_qc = glob('.peak_region_size.qc')[0]\n\t\tFile peak_region_size_plot = glob('.peak_region_size.png')[0]\n\t\tFile num_peak_qc = glob('.num_peak.qc')[0]\n\t}\n\truntime {\n\t\tcpu : if peak_caller == 'macs2' then 1 else cpu\n\t\tmemory : '${mem_mb} MB'\n\t\ttime : time_hr\n\t\tdisks : disks\n\t\tpreemptible: 0\t\t\n\t}\n}\n\ntask macs2_signal_track {\n\tArray[File?] tas\t# [ta, control_ta]. control_ta is optional\n\tInt fraglen \t\t# fragment length from xcor\n\tString gensz\t\t# Genome size (sum of entries in 2nd column of \n # chr. sizes file, or hs for human, ms for mouse)\n\tFile chrsz\t\t\t# 2-col chromosome sizes file\n\tFloat pval_thresh \t# p.value threshold\n\n\tInt mem_mb\n\tInt time_hr\n\tString disks\n\n\tcommand {\n\t\tpython3 $(which encode_task_macs2_signal_track_chip.py) \\n\t\t\t${sep=' ' tas} \\n\t\t\t${'--gensz '+ gensz} \\n\t\t\t${'--chrsz ' + chrsz} \\n\t\t\t${'--fraglen ' + fraglen} \\n\t\t\t${'--pval-thresh '+ pval_thresh}\n\t}\n\toutput {\n\t\tFile pval_bw = glob('.pval.signal.bigwig')[0]\n\t\tFile fc_bw = glob('.fc.signal.bigwig')[0]\n\t}\n\truntime {\n\t\tcpu : 1\n\t\tmemory : '${mem_mb} MB'\n\t\ttime : time_hr\n\t\tdisks : disks\n\t\tpreemptible: 0\n\t}\n}\n\ntask idr {\n\tString prefix \t\t# prefix for IDR output file\n\tFile peak1 \t\t\t\n\tFile peak2\n\tFile peak_pooled\n\tFloat idr_thresh\n\tFile? blacklist \t# blacklist BED to filter raw peaks\n\tString regex_bfilt_peak_chr_name\n\t# parameters to compute FRiP\n\tFile? ta\t\t\t# to calculate FRiP\n\tInt fraglen \t\t# fragment length from xcor\n\tFile chrsz\t\t\t# 2-col chromosome sizes file\n\tString peak_type\n\tString rank\n\n\tcommand {\n\t\t${if defined(ta) then '' else 'touch null.frip.qc'}\n\t\ttouch null \n\t\tpython3 $(which encode_task_idr.py) \\n\t\t\t${peak1} ${peak2} ${peak_pooled} \\n\t\t\t${'--prefix ' + prefix} \\n\t\t\t${'--idr-thresh ' + idr_thresh} \\n\t\t\t${'--peak-type ' + peak_type} \\n\t\t\t--idr-rank ${rank} \\n\t\t\t${'--fraglen ' + fraglen} \\n\t\t\t${'--chrsz ' + chrsz} \\n\t\t\t${'--blacklist '+ blacklist} \\n\t\t\t${'--regex-bfilt-peak-chr-name \'' + regex_bfilt_peak_chr_name + '\''} \\n\t\t\t${'--ta ' + ta}\n\t}\n\toutput {\n\t\tFile idr_peak = glob('[!.][!b][!f][!i][!l][!t].'+peak_type+'.gz')[0]\n\t\tFile bfilt_idr_peak = glob('.bfilt.'+peak_type+'.gz')[0]\n\t\tFile bfilt_idr_peak_bb = glob('.bfilt.'+peak_type+'.bb')[0]\n\t\tFile bfilt_idr_peak_hammock = glob('.bfilt.'+peak_type+'.hammock.gz')[0]\n\t\tFile bfilt_idr_peak_hammock_tbi = glob('.bfilt.'+peak_type+'.hammock.gz')[1]\n\t\tFile idr_plot = glob('.txt.png')[0]\n\t\tFile idr_unthresholded_peak = glob('.txt.gz')[0]\n\t\tFile idr_log = glob('.idr.log')[0]\n\t\tFile frip_qc = if defined(ta) then glob('.frip.qc')[0] else glob('null')[0]\n\t}\n\truntime {\n\t\tcpu : 1\n\t\tmemory : '4000 MB'\n\t\ttime : 1\n\t\tdisks : 'local-disk 50 HDD'\n\t}\t\n}\n\ntask overlap {\n\tString prefix \t# prefix for IDR output file\n\tFile peak1\n\tFile peak2\n\tFile peak_pooled\n\tFile? blacklist # blacklist BED to filter raw peaks\n\tString regex_bfilt_peak_chr_name\n\t# parameters to compute FRiP\n\tFile? ta\t\t# to calculate FRiP\n\tInt fraglen \t# fragment length from xcor (for FRIP)\n\tFile chrsz\t\t# 2-col chromosome sizes file\n\tString peak_type\n\n\tcommand {\n\t\t${if defined(ta) then '' else 'touch null.frip.qc'}\n\t\ttouch null \n\t\tpython3 $(which encode_task_overlap.py) \\n\t\t\t${peak1} ${peak2} ${peak_pooled} \\n\t\t\t${'--prefix ' + prefix} \\n\t\t\t${'--peak-type ' + peak_type} \\n\t\t\t${'--fraglen ' + fraglen} \\n\t\t\t${'--chrsz ' + chrsz} \\n\t\t\t${'--blacklist '+ blacklist} \\n\t\t\t--nonamecheck \\n\t\t\t${'--regex-bfilt-peak-chr-name \'' + regex_bfilt_peak_chr_name + '\''} \\n\t\t\t${'--ta ' + ta}\n\t}\n\toutput {\n\t\tFile overlap_peak = glob('[!.][!b][!f][!i][!l][!t].'+peak_type+'.gz')[0]\n\t\tFile bfilt_overlap_peak = glob('.bfilt.'+peak_type+'.gz')[0]\n\t\tFile bfilt_overlap_peak_bb = glob('.bfilt.'+peak_type+'.bb')[0]\n\t\tFile bfilt_overlap_peak_hammock = glob('.bfilt.'+peak_type+'.hammock.gz')[0]\n\t\tFile bfilt_overlap_peak_hammock_tbi = glob('.bfilt.'+peak_type+'.hammock.gz')[1]\n\t\tFile frip_qc = if defined(ta) then glob('.frip.qc')[0] else glob('null')[0]\n\t}\n\truntime {\n\t\tcpu : 1\n\t\tmemory : '4000 MB'\n\t\ttime : 1\n\t\tdisks : 'local-disk 50 HDD'\n\t}\t\n}\n\ntask reproducibility {\n\tString prefix\n\tArray[File]? peaks # peak files from pair of true replicates\n\t\t\t\t\t\t# in a sorted order. for example of 4 replicates,\n\t\t\t\t\t\t# 1,2 1,3 1,4 2,3 2,4 3,4.\n # x,y means peak file from rep-x vs rep-y\n\tArray[File?] peaks_pr\t# peak files from pseudo replicates\n\tFile? peak_ppr\t\t\t# Peak file from pooled pseudo replicate.\n\tString peak_type\n\tFile chrsz\t\t\t# 2-col chromosome sizes file\n\n\tcommand {\n\t\tpython3 $(which encode_task_reproducibility.py) \\n\t\t\t${sep=' ' peaks} \\n\t\t\t--peaks-pr ${sep=' ' peaks_pr} \\n\t\t\t${'--peak-ppr '+ peak_ppr} \\n\t\t\t--prefix ${prefix} \\n\t\t\t${'--peak-type ' + peak_type} \\n\t\t\t${'--chrsz ' + chrsz}\n\t}\n\toutput {\n\t\tFile optimal_peak = glob('optimal_peak..gz')[0]\n\t\tFile optimal_peak_bb = glob('optimal_peak..bb')[0]\n\t\tFile optimal_peak_hammock = glob('optimal_peak..hammock.gz')[0]\n\t\tFile optimal_peak_hammock_tbi = glob('optimal_peak..hammock.gz')[1]\n\t\tFile conservative_peak = glob('conservative_peak..gz')[0]\n\t\tFile conservative_peak_bb = glob('conservative_peak..bb')[0]\n\t\tFile conservative_peak_hammock = glob('conservative_peak..hammock.gz')[0]\n\t\tFile conservative_peak_hammock_tbi = glob('conservative_peak..hammock.gz')[1]\n\t\tFile reproducibility_qc = glob('reproducibility.qc')[0]\n\t\t# QC metrics for optimal peak\n\t\tFile peak_region_size_qc = glob('.peak_region_size.qc')[0]\n\t\tFile peak_region_size_plot = glob('.peak_region_size.png')[0]\n\t\tFile num_peak_qc = glob('.num_peak.qc')[0]\n\t}\n\truntime {\n\t\tcpu : 1\n\t\tmemory : '4000 MB'\n\t\ttime : 1\n\t\tdisks : 'local-disk 50 HDD'\n\t}\t\n}\n\ntask gc_bias {\n\tFile nodup_bam\n\tFile ref_fa\n\n\tString? picard_java_heap\n\n\tcommand {\n\t\tpython3 $(which encode_task_gc_bias.py) \\n\t\t\t${'--nodup-bam ' + nodup_bam} \\n\t\t\t${'--ref-fa ' + ref_fa} \\n\t\t\t${'--picard-java-heap ' + if defined(picard_java_heap) then picard_java_heap else '10G'}\n\t}\n\toutput {\n\t\tFile gc_plot = glob('.gc_plot.png')[0]\n\t\tFile gc_log = glob('.gc.txt')[0]\n\t}\n\truntime {\n\t\tcpu : 1\n\t\tmemory : '10000 MB'\n\t\ttime : 6\n\t\tdisks : 'local-disk 100 HDD'\n\t}\n}\n\n# gather all outputs and generate \n# - qc.html\t\t: organized final HTML report\n# - qc.json\t\t: all QCs\ntask qc_report {\n\t# optional metadata\n\tString pipeline_ver\n \tString title # name of sample\n\tString description # description for sample\n\tString? genome\n\t#String? encode_accession_id\t# ENCODE accession ID of sample\n\t# workflow params\n\tArray[Boolean?] paired_ends\n\tArray[Boolean?] ctl_paired_ends\n\tString pipeline_type\n\tString aligner\n\tString peak_caller\n\tInt cap_num_peak\n\tFloat idr_thresh\n\tFloat pval_thresh\n\tInt xcor_trim_bp\n\tInt xcor_subsample_reads\n\t# QCs\n\tArray[File?] samstat_qcs\n\tArray[File?] nodup_samstat_qcs\n\tArray[File?] dup_qcs\n\tArray[File?] lib_complexity_qcs\n\tArray[File?] ctl_samstat_qcs\n\tArray[File?] ctl_nodup_samstat_qcs\n\tArray[File?] ctl_dup_qcs\n\tArray[File?] ctl_lib_complexity_qcs\n\tArray[File?] xcor_plots\n\tArray[File?] xcor_scores\n\tFile? jsd_plot\n\tArray[File]? jsd_qcs\n\tArray[File]? idr_plots\n\tArray[File]? idr_plots_pr\n\tFile? idr_plot_ppr\n\tArray[File?] frip_qcs\n\tArray[File?] frip_qcs_pr1\n\tArray[File?] frip_qcs_pr2\n\tFile? frip_qc_pooled\n\tFile? frip_qc_ppr1 \n\tFile? frip_qc_ppr2 \n\tArray[File]? frip_idr_qcs\n\tArray[File]? frip_idr_qcs_pr\n\tFile? frip_idr_qc_ppr \n\tArray[File]? frip_overlap_qcs\n\tArray[File]? frip_overlap_qcs_pr\n\tFile? frip_overlap_qc_ppr\n\tFile? idr_reproducibility_qc\n\tFile? overlap_reproducibility_qc\n\n\tArray[File?] gc_plots\n\n\tArray[File?] peak_region_size_qcs\n\tArray[File?] peak_region_size_plots\n\tArray[File?] num_peak_qcs\n\n\tFile? idr_opt_peak_region_size_qc\n\tFile? idr_opt_peak_region_size_plot\n\tFile? idr_opt_num_peak_qc\n\n\tFile? overlap_opt_peak_region_size_qc\n\tFile? overlap_opt_peak_region_size_plot\n\tFile? overlap_opt_num_peak_qc\n\n\tFile? qc_json_ref\n\n\tcommand {\n\t\tpython3 $(which encode_task_qc_report.py) \\n\t\t\t${'--pipeline-ver ' + pipelinever} \\n\t\t\t${\"--title '\" + sub(title,\"'\",\"\") + \"'\"} \\n\t\t\t${\"--desc '\" + sub(description,\"'\",\"_\") + \"'\"} \\n\t\t\t${'--genome ' + genome} \\n\t\t\t${'--multimapping ' + 0} \\n\t\t\t--paired-ends ${sep=' ' paired_ends} \\n\t\t\t--ctl-paired-ends ${sep=' ' ctl_paired_ends} \\n\t\t\t--pipeline-type ${pipeline_type} \\n\t\t\t--aligner ${aligner} \\n\t\t\t--peak-caller ${peak_caller} \\n\t\t\t${'--cap-num-peak ' + cap_num_peak} \\n\t\t\t--idr-thresh ${idr_thresh} \\n\t\t\t--pval-thresh ${pval_thresh} \\n\t\t\t--xcor-trim-bp ${xcor_trim_bp} \\n\t\t\t--xcor-subsample-reads ${xcor_subsamplereads} \\n\t\t\t--samstat-qcs ${sep=':_' samstatqcs} \\n\t\t\t--nodup-samstat-qcs ${sep=':_' nodup_samstatqcs} \\n\t\t\t--dup-qcs ${sep=':_' dupqcs} \\n\t\t\t--lib-complexity-qcs ${sep=':_' lib_complexityqcs} \\n\t\t\t--xcor-plots ${sep=':_' xcorplots} \\n\t\t\t--xcor-scores ${sep=':_' xcorscores} \\n\t\t\t--idr-plots ${sep=':_' idrplots} \\n\t\t\t--idr-plots-pr ${sep=':_' idr_plotspr} \\n\t\t\t--ctl-samstat-qcs ${sep=':_' ctl_samstatqcs} \\n\t\t\t--ctl-nodup-samstat-qcs ${sep=':_' ctl_nodup_samstatqcs} \\n\t\t\t--ctl-dup-qcs ${sep=':_' ctl_dupqcs} \\n\t\t\t--ctl-lib-complexity-qcs ${sep=':_' ctl_lib_complexity_qcs} \\n\t\t\t${'--jsd-plot ' + jsdplot} \\n\t\t\t--jsd-qcs ${sep=':_' jsd_qcs} \\n\t\t\t${'--idr-plot-ppr ' + idr_plotppr} \\n\t\t\t--frip-qcs ${sep=':_' fripqcs} \\n\t\t\t--frip-qcs-pr1 ${sep=':_' frip_qcspr1} \\n\t\t\t--frip-qcs-pr2 ${sep=':_' frip_qcs_pr2} \\n\t\t\t${'--frip-qc-pooled ' + frip_qc_pooled} \\n\t\t\t${'--frip-qc-ppr1 ' + frip_qc_ppr1} \\n\t\t\t${'--frip-qc-ppr2 ' + frip_qcppr2} \\n\t\t\t--frip-idr-qcs ${sep=':_' frip_idrqcs} \\n\t\t\t--frip-idr-qcs-pr ${sep=':_' frip_idr_qcs_pr} \\n\t\t\t${'--frip-idr-qc-ppr ' + frip_idr_qcppr} \\n\t\t\t--frip-overlap-qcs ${sep=':_' frip_overlapqcs} \\n\t\t\t--frip-overlap-qcs-pr ${sep=':_' frip_overlap_qcs_pr} \\n\t\t\t${'--frip-overlap-qc-ppr ' + frip_overlap_qc_ppr} \\n\t\t\t${'--idr-reproducibility-qc ' + idr_reproducibility_qc} \\n\t\t\t${'--overlap-reproducibility-qc ' + overlap_reproducibilityqc} \\n\t\t\t--gc-plots ${sep=':_' gcplots} \\n\t\t\t--peak-region-size-qcs ${sep=':_' peak_region_sizeqcs} \\n\t\t\t--peak-region-size-plots ${sep=':_' peak_region_sizeplots} \\n\t\t\t--num-peak-qcs ${sep=':_' num_peak_qcs} \\n\t\t\t${'--idr-opt-peak-region-size-qc ' + idr_opt_peak_region_size_qc} \\n\t\t\t${'--idr-opt-peak-region-size-plot ' + idr_opt_peak_region_size_plot} \\n\t\t\t${'--idr-opt-num-peak-qc ' + idr_opt_num_peak_qc} \\n\t\t\t${'--overlap-opt-peak-region-size-qc ' + overlap_opt_peak_region_size_qc} \\n\t\t\t${'--overlap-opt-peak-region-size-plot ' + overlap_opt_peak_region_size_plot} \\n\t\t\t${'--overlap-opt-num-peak-qc ' + overlap_opt_num_peak_qc} \\n\t\t\t--out-qc-html qc.html \\n\t\t\t--out-qc-json qc.json \\n\t\t\t${'--qc-json-ref ' + qc_json_ref}\n\t}\n\toutput {\n\t\tFile report = glob('qc.html')[0]\n\t\tFile qc_json = glob('qc.json')[0]\n\t\tBoolean qc_json_ref_match = read_string('qc_json_ref_match.txt')=='True'\n\t}\n\truntime {\n\t\tcpu : 1\n\t\tmemory : '4000 MB'\n\t\ttime : 1\n\t\tdisks : 'local-disk 50 HDD'\n\t}\t\n}\n\n### workflow system tasks\ntask read_genome_tsv {\n\tFile genome_tsv\n\n\tString? null_s\n\tcommand <<<\n\t\t# create empty files for all entries\n\t\ttouch genome_name\n\t\ttouch ref_fa bowtie2_idx_tar bwa_idx_tar chrsz gensz blacklist blacklist2\n\t\ttouch custom_aligner_idx_tar\n\t\ttouch tss tss_enrich # for backward compatibility\n\t\ttouch dnase prom enh reg2map reg2map_bed roadmap_meta\n\t\ttouch mito_chr_name\n\t\ttouch regex_bfilt_peak_chr_name\n\n\t\tpython <<CODE\n\t\timport os\n\t\twith open('${genome_tsv}','r') as fp:\n\t\t\tfor line in fp:\n\t\t\t\tarr = line.strip('\n').split('\t')\n\t\t\t\tif arr:\n\t\t\t\t\tkey, val = arr\n\t\t\t\t\twith open(key,'w') as fp2:\n\t\t\t\t\t\tfp2.write(val)\n\t\tCODE\n\t>>>\n\toutput {\n\t\tString? genome_name = if size('genome_name')==0 then basename(genome_tsv) else read_string('genome_name')\n\t\tString? ref_fa = if size('ref_fa')==0 then null_s else read_string('ref_fa')\n\t\tString? bwa_idx_tar = if size('bwa_idx_tar')==0 then null_s else read_string('bwa_idx_tar')\n\t\tString? bowtie2_idx_tar = if size('bowtie2_idx_tar')==0 then null_s else read_string('bowtie2_idx_tar')\n\t\tString? custom_aligner_idx_tar = if size('custom_aligner_idx_tar')==0 then null_s else read_string('custom_aligner_idx_tar')\n\t\tString? chrsz = if size('chrsz')==0 then null_s else read_string('chrsz')\n\t\tString? gensz = if size('gensz')==0 then null_s else read_string('gensz')\n\t\tString? blacklist = if size('blacklist')==0 then null_s else read_string('blacklist')\n\t\tString? blacklist2 = if size('blacklist2')==0 then null_s else read_string('blacklist2')\n\t\tString? mito_chr_name = if size('mito_chr_name')==0 then null_s else read_string('mito_chr_name')\n\t\tString? regex_bfilt_peak_chr_name = if size('regex_bfilt_peak_chr_name')==0 then 'chr[\\dXY]+'\n\t\t\telse read_string('regex_bfilt_peak_chr_name')\n\t\t# optional data\n\t\tString? tss = if size('tss')!=0 then read_string('tss')\n\t\t\telse if size('tss_enrich')!=0 then read_string('tss_enrich') else null_s\n\t\tString? dnase = if size('dnase')==0 then null_s else read_string('dnase')\n\t\tString? prom = if size('prom')==0 then null_s else read_string('prom')\n\t\tString? enh = if size('enh')==0 then null_s else read_string('enh')\n\t\tString? reg2map = if size('reg2map')==0 then null_s else read_string('reg2map')\n\t\tString? reg2map_bed = if size('reg2map_bed')==0 then null_s else read_string('reg2map_bed')\n\t\tString? roadmap_meta = if size('roadmap_meta')==0 then null_s else read_string('roadmap_meta')\n\t}\n\truntime {\n\t\tmaxRetries : 0\n\t\tcpu : 1\n\t\tmemory : '4000 MB'\n\t\ttime : 1\n\t\tdisks : 'local-disk 50 HDD'\t\t\n\t}\n}\n\ntask roundedmean {\n\tArray[Int] ints\n\tcommand <<<\n\t\tpython <<CODE\n\t\tarr = [${sep=',' ints}]\n\t\twith open('tmp.txt','w') as fp:\n\t\t\tif len(arr):\n\t\t\t sum = sum(arr)\n\t\t\t mean = sum(arr)/float(len(arr))\n\t\t\t fp.write('{}'.format(int(round(mean))))\n\t\t\telse:\n\t\t\t fp.write('0')\n\t\tCODE\n\t>>>\n\toutput {\n\t\tInt rounded_mean = read_int('tmp.txt')\n\t}\n\truntime {\n\t\tcpu : 1\n\t\tmemory : '4000 MB'\n\t\ttime : 1\n\t\tdisks : 'local-disk 50 HDD'\n\t}\t\n}\n\ntask raise_exception {\n\tString msg\n\tcommand {\n\t\techo -e \"\n Error: ${msg}\n\" >&2\n\t\texit 2\n\t}\n\toutput {\n\t\tString error_msg = '${msg}'\n\t}\n\truntime {\n\t\tmaxRetries : 0\n\t}\n}\n", "root": "", "options": "{\n \"backend\": \"slurm\",\n \"default_runtime_attributes\": {\n \"maxRetries\": 1,\n \"slurm_account\": \"aeurban\"\n }\n}", "inputs": "{\n \"chip.title\" : \"chipseq miseq nano\",\n \"chip.description\" : \"chip-seq test miseq nano ipsc\",\n\n \"chip.pipeline_type\" : \"tf\",\n \"chip.aligner\" : \"bowtie2\",\n \"chip.align_only\" : false,\n \"chip.true_rep_only\" : false,\n\n \"chip.genome_tsv\" : \"/reference/ENCODE/pipeline_genome_data/genome_tsv/v1/hg38_scg.tsv\",\n\n \"chip.paired_end\" : true,\n \"chip.ctl_paired_end\" : true,\n\n \"chip.fastqs_rep1_R1\" : [ \"/labs/dflev/siming/chipseq/raw_fastq/test_miseqnano/0425-3-cetch1-flag_subsample1_R1.fastq.gz\" ],\n \"chip.fastqs_rep1_R2\" : [ \"/labs/dflev/siming/chipseq/raw_fastq/test_miseqnano/0425-3-cetch1-flag_subsample1_R2.fastq.gz\" ],\n \"chip.ctl_fastqs_rep1_R1\" : [ \"/labs/dflev/siming/chipseq/raw_fastq/test_miseqnano/0425-3-cetch1-input_subsample1_R1.fastq.gz\" ],\n \"chip.ctl_fastqs_rep1_R2\" : [ \"/labs/dflev/siming/chipseq/raw_fastq/test_miseqnano/0425-3-cetch1-input_subsample1_R2.fastq.gz\" ],\n\n \"chip.crop_length\" : 0,\n\n \"chip.mapq_thresh\" : 30,\n \"chip.dup_marker\" : \"picard\",\n \"chip.no_dup_removal\" : false,\n\n \"chip.subsample_reads\" : 0,\n \"chip.ctl_subsample_reads\" : 0,\n \"chip.xcor_subsample_reads\" : 15000000,\n\n \"chip.xcor_trim_bp\" : 50,\n \"chip.use_filt_pe_ta_for_xcor\" : false,\n\n \"chip.always_use_pooled_ctl\" : false,\n \"chip.ctl_depth_ratio\" : 1.2,\n\n \"chip.peak_caller\" : null,\n \"chip.cap_num_peak_macs2\" : 500000,\n \"chip.pval_thresh\" : 0.01,\n \"chip.fdr_thresh\" : 0.01,\n \"chip.idr_thresh\" : 0.05,\n \"chip.cap_num_peak_spp\" : 300000,\n\n \"chip.enable_jsd\" : true,\n \"chip.enable_gc_bias\" : true,\n \"chip.enable_count_signal_track\" : false,\n\n \"chip.filter_chrs\" : [],\n\n \"chip.align_cpu\" : 4,\n \"chip.align_mem_mb\" : 20000,\n \"chip.align_time_hr\" : 144,\n \"chip.align_disks\" : \"local-disk 400 HDD\",\n\n \"chip.filter_cpu\" : 2,\n \"chip.filter_mem_mb\" : 20000,\n \"chip.filter_time_hr\" : 144,\n \"chip.filter_disks\" : \"local-disk 400 HDD\",\n\n \"chip.bam2ta_cpu\" : 2,\n \"chip.bam2ta_mem_mb\" : 10000,\n \"chip.bam2ta_time_hr\" : 144,\n \"chip.bam2ta_disks\" : \"local-disk 100 HDD\",\n\n \"chip.spr_mem_mb\" : 16000,\n\n \"chip.jsd_cpu\" : 2,\n \"chip.jsd_mem_mb\" : 12000,\n \"chip.jsd_time_hr\" : 144,\n \"chip.jsd_disks\" : \"local-disk 200 HDD\",\n\n \"chip.xcor_cpu\" : 2,\n \"chip.xcor_mem_mb\" : 16000,\n \"chip.xcor_time_hr\" : 144,\n \"chip.xcor_disks\" : \"local-disk 100 HDD\",\n\n \"chip.call_peak_cpu\" : 2,\n \"chip.call_peak_mem_mb\" : 16000,\n \"chip.call_peak_time_hr\" : 144,\n \"chip.call_peak_disks\" : \"local-disk 200 HDD\",\n\n \"chip.macs2_signal_track_mem_mb\" : 16000,\n \"chip.macs2_signal_track_time_hr\" : 144,\n \"chip.macs2_signal_track_disks\" : \"local-disk 400 HDD\",\n\n \"chip.gc_bias_picard_java_heap\" : \"10G\"\n}\n", "workflowUrl": "/labs/dflev/siming/software/chip-seq-pipeline2/chip.wdl", "labels": "{\n \"caper-backend\": \"slurm\",\n \"caper-str-label\": \"chipseq_nano3_test1\",\n \"caper-user\": \"simingz\"\n}" }, "calls": { "chip.overlap_pr": [ { "retryableFailure": true, "executionStatus": "RetryableFailure", "stdout": "/labs/dflev/siming/chipseq/encode_analysis_pipeline/chipseq_nano3_test5/chip/8b1e7e4e-e0a8-4328-9e2d-1b776b0b1b06/call-overlap_pr/shard-0/execution/stdout", "backendStatus": "Done", "commandLine": "\ntouch null \npython3 $(which encode_task_overlap.py) \\n\t/labs/dflev/siming/chipseq/encode_analysis_pipeline/chipseq_nano3_test5/chip/8b1e7e4e-e0a8-4328-9e2d-1b776b0b1b06/call-overlap_pr/shard-0/inputs/1836791510/0425-3-cetch1-flag_subsample1_R1.nodup.pr1_x_ctl_for_rep1.300K.regionPeak.gz /labs/dflev/siming/chipseq/encode_analysis_pipeline/chipseq_nano3_test5/chip/8b1e7e4e-e0a8-4328-9e2d-1b776b0b1b06/call-overlap_pr/shard-0/inputs/-1827717161/0425-3-cetch1-flag_subsample1_R1.nodup.pr2_x_ctl_for_rep1.300K.regionPeak.gz /labs/dflev/siming/chipseq/encode_analysis_pipeline/chipseq_nano3_test5/chip/8b1e7e4e-e0a8-4328-9e2d-1b776b0b1b06/call-overlap_pr/shard-0/inputs/1290460198/0425-3-cetch1-flag_subsample1_R1.nodup_x_ctl_for_rep1.300K.regionPeak.gz \\n\t--prefix rep1-pr1_vs_rep1-pr2 \\n\t--peak-type regionPeak \\n\t--fraglen 200 \\n\t--chrsz /labs/dflev/siming/chipseq/encode_analysis_pipeline/chipseq_nano3_test5/chip/8b1e7e4e-e0a8-4328-9e2d-1b776b0b1b06/call-overlap_pr/shard-0/inputs/-723689832/hg38.chrom.sizes \\n\t--blacklist /labs/dflev/siming/chipseq/encode_analysis_pipeline/chipseq_nano3_test5/chip/8b1e7e4e-e0a8-4328-9e2d-1b776b0b1b06/call-overlap_pr/shard-0/inputs/-723689832/hg38.blacklist.bed.gz \\n\t--nonamecheck \\n\t--regex-bfilt-peak-chr-name 'chr[\dXY]+' \\n\t--ta /labs/dflev/siming/chipseq/encode_analysis_pipeline/chipseq_nano3_test5/chip/8b1e7e4e-e0a8-4328-9e2d-1b776b0b1b06/call-overlap_pr/shard-0/inputs/-1113689129/0425-3-cetch1-flag_subsample1_R1.nodup.tagAlign.gz", "shardIndex": 0, "runtimeAttributes": { "slurm_account": "aeurban", "failOnStderr": "false", "continueOnReturnCode": "0", "maxRetries": "1", "cpu": "1", "time": "1", "memory": "3.90625 GB" }, "callCaching": { "allowResultReuse": false, "hit": false, "result": "Cache Miss", "hashes": { "output count": "1679091C5A880FAF6FB5E6087EB1B2DC", "runtime attribute": { "failOnStderr": "68934A3E9455FA72420237EB05902327", "docker": "N/A", "continueOnReturnCode": "CFCD208495D565EF66E7DFF9F98764DA" }, "output expression": { "File overlap_peak": "5FE0ABD18E429C34B9371C7F78153E57", "File frip_qc": "FFBDC0CF158D56A15AA0F87CB842D5C8", "File bfilt_overlap_peak_hammock": "AAEDB2388B1A8885F60BDF99860308F2", "File bfilt_overlap_peak_bb": "D2388B1E435478D195314A0DE2E9CEEB", "File bfilt_overlap_peak_hammock_tbi": "F7AD80D930D3266E42DF62760BB4CA48", "File bfilt_overlap_peak": "504E92495AEB2A7383BCE85ACF634E4A" }, "input count": "D3D9446802A44259755D38E6D163E820", "backend name": "8C338F9EC0DF1BB3621803F2983039DA", "command template": "B64FD519BA0800AF98DB11528D527648", "input": { "File peak2": "7e16b4ac12c4afe624c8ece83ff994a4", "File blacklist": "4a58b07c400e8641dc3df79ad6fcf12c", "Int fraglen": "3644A684F98EA8FE223C713B77189A77", "File chrsz": "9b996e0d2eabf4f49ef31793c11d2f45", "String regex_bfilt_peak_chr_name": "C742E46BA2D35AFA2B5C0BBCB7D31CE7", "File peak_pooled": "44a9d024f599eaf7003cafd63a1a8574", "File ta": "2db8b57a38e09aa03ad502953b1adc50", "String prefix": "D3B9D7856D560D6435DB69A1320959A6", "String peak_type": "CCC3DB10222D85D8F7B6CE07DD9D8E2F", "File peak1": "b4ada56382dc83c1a00602df92c79ba5" } }, "effectiveCallCachingMode": "ReadAndWriteCache" }, "inputs": { "blacklist": "/reference/ENCODE/pipeline_genome_data/hg38/hg38.blacklist.bed.gz", "peak_type": "regionPeak", "ta": "/labs/dflev/siming/chipseq/encode_analysis_pipeline/chipseq_nano3_test5/chip/8b1e7e4e-e0a8-4328-9e2d-1b776b0b1b06/call-bam2ta/shard-0/execution/glob-199637d3015dccbe277f621a18be9eb4/0425-3-cetch1-flag_subsample1_R1.nodup.tagAlign.gz", "prefix": "rep1-pr1_vs_rep1-pr2", "peak2": "/labs/dflev/siming/chipseq/encode_analysis_pipeline/chipseq_nano3_test5/chip/8b1e7e4e-e0a8-4328-9e2d-1b776b0b1b06/call-call_peak_pr2/shard-0/execution/glob-3c17d82667de5da0b56a7c17d88175d2/0425-3-cetch1-flag_subsample1_R1.nodup.pr2_x_ctl_for_rep1.300K.regionPeak.gz", "fraglen": 200, "chrsz": "/reference/ENCODE/pipeline_genome_data/hg38/hg38.chrom.sizes", "peak1": "/labs/dflev/siming/chipseq/encode_analysis_pipeline/chipseq_nano3_test5/chip/8b1e7e4e-e0a8-4328-9e2d-1b776b0b1b06/call-call_peak_pr1/shard-0/execution/glob-3c17d82667de5da0b56a7c17d88175d2/0425-3-cetch1-flag_subsample1_R1.nodup.pr1_x_ctl_for_rep1.300K.regionPeak.gz", "regex_bfilt_peak_chr_name": "chr[\dXY]+", "peak_pooled": "/labs/dflev/siming/chipseq/encode_analysis_pipeline/chipseq_nano3_test5/chip/8b1e7e4e-e0a8-4328-9e2d-1b776b0b1b06/call-call_peak/shard-0/execution/glob-3c17d82667de5da0b56a7c17d88175d2/0425-3-cetch1-flag_subsample1_R1.nodup_x_ctl_for_rep1.300K.regionPeak.gz" }, "returnCode": 1, "failures": [ { "causedBy": [], "message": "Job chip.overlap_pr:0:1 exited with return code 1 which has not been declared as a valid return code. See 'continueOnReturnCode' runtime attribute for more details." } ], "jobId": "14894969", "backend": "slurm", "end": "2020-04-07T21:30:05.053Z", "stderr": "/labs/dflev/siming/chipseq/encode_analysis_pipeline/chipseq_nano3_test5/chip/8b1e7e4e-e0a8-4328-9e2d-1b776b0b1b06/call-overlap_pr/shard-0/execution/stderr", "callRoot": "/labs/dflev/siming/chipseq/encode_analysis_pipeline/chipseq_nano3_test5/chip/8b1e7e4e-e0a8-4328-9e2d-1b776b0b1b06/call-overlap_pr/shard-0", "attempt": 1, "executionEvents": [ { "startTime": "2020-04-07T21:30:05.052Z", "description": "UpdatingJobStore", "endTime": "2020-04-07T21:30:05.052Z" }, { "startTime": "2020-04-07T21:29:43.448Z", "description": "Pending", "endTime": "2020-04-07T21:29:43.449Z" }, { "startTime": "2020-04-07T21:29:44.874Z", "endTime": "2020-04-07T21:29:44.875Z", "description": "WaitingForValueStore" }, { "startTime": "2020-04-07T21:29:43.449Z", "endTime": "2020-04-07T21:29:44.874Z", "description": "RequestingExecutionToken" }, { "description": "RunningJob", "startTime": "2020-04-07T21:29:44.944Z", "endTime": "2020-04-07T21:30:05.052Z" }, { "endTime": "2020-04-07T21:29:44.944Z", "startTime": "2020-04-07T21:29:44.885Z", "description": "CallCacheReading" }, { "startTime": "2020-04-07T21:29:44.875Z", "description": "PreparingJob", "endTime": "2020-04-07T21:29:44.885Z" } ], "start": "2020-04-07T21:29:43.448Z" }, { "retryableFailure": false, "executionStatus": "Failed", "stdout": "/labs/dflev/siming/chipseq/encode_analysis_pipeline/chipseq_nano3_test5/chip/8b1e7e4e-e0a8-4328-9e2d-1b776b0b1b06/call-overlap_pr/shard-0/attempt-2/execution/stdout", "backendStatus": "Done", "commandLine": "\ntouch null \npython3 $(which encode_task_overlap.py) \\n\t/labs/dflev/siming/chipseq/encode_analysis_pipeline/chipseq_nano3_test5/chip/8b1e7e4e-e0a8-4328-9e2d-1b776b0b1b06/call-overlap_pr/shard-0/attempt-2/inputs/1836791510/0425-3-cetch1-flag_subsample1_R1.nodup.pr1_x_ctl_for_rep1.300K.regionPeak.gz /labs/dflev/siming/chipseq/encode_analysis_pipeline/chipseq_nano3_test5/chip/8b1e7e4e-e0a8-4328-9e2d-1b776b0b1b06/call-overlap_pr/shard-0/attempt-2/inputs/-1827717161/0425-3-cetch1-flag_subsample1_R1.nodup.pr2_x_ctl_for_rep1.300K.regionPeak.gz /labs/dflev/siming/chipseq/encode_analysis_pipeline/chipseq_nano3_test5/chip/8b1e7e4e-e0a8-4328-9e2d-1b776b0b1b06/call-overlap_pr/shard-0/attempt-2/inputs/1290460198/0425-3-cetch1-flag_subsample1_R1.nodup_x_ctl_for_rep1.300K.regionPeak.gz \\n\t--prefix rep1-pr1_vs_rep1-pr2 \\n\t--peak-type regionPeak \\n\t--fraglen 200 \\n\t--chrsz /labs/dflev/siming/chipseq/encode_analysis_pipeline/chipseq_nano3_test5/chip/8b1e7e4e-e0a8-4328-9e2d-1b776b0b1b06/call-overlap_pr/shard-0/attempt-2/inputs/-723689832/hg38.chrom.sizes \\n\t--blacklist /labs/dflev/siming/chipseq/encode_analysis_pipeline/chipseq_nano3_test5/chip/8b1e7e4e-e0a8-4328-9e2d-1b776b0b1b06/call-overlap_pr/shard-0/attempt-2/inputs/-723689832/hg38.blacklist.bed.gz \\n\t--nonamecheck \\n\t--regex-bfilt-peak-chr-name 'chr[\dXY]+' \\n\t--ta /labs/dflev/siming/chipseq/encode_analysis_pipeline/chipseq_nano3_test5/chip/8b1e7e4e-e0a8-4328-9e2d-1b776b0b1b06/call-overlap_pr/shard-0/attempt-2/inputs/-1113689129/0425-3-cetch1-flag_subsample1_R1.nodup.tagAlign.gz", "shardIndex": 0, "runtimeAttributes": { "slurm_account": "aeurban", "failOnStderr": "false", "continueOnReturnCode": "0", "maxRetries": "1", "cpu": "1", "time": "1", "memory": "3.90625 GB" }, "callCaching": { "allowResultReuse": false, "effectiveCallCachingMode": "WriteCache", "hashes": { "output count": "1679091C5A880FAF6FB5E6087EB1B2DC", "runtime attribute": { "docker": "N/A", "failOnStderr": "68934A3E9455FA72420237EB05902327", "continueOnReturnCode": "CFCD208495D565EF66E7DFF9F98764DA" }, "output expression": { "File overlap_peak": "5FE0ABD18E429C34B9371C7F78153E57", "File frip_qc": "FFBDC0CF158D56A15AA0F87CB842D5C8", "File bfilt_overlap_peak_hammock": "AAEDB2388B1A8885F60BDF99860308F2", "File bfilt_overlap_peak_bb": "D2388B1E435478D195314A0DE2E9CEEB", "File bfilt_overlap_peak_hammock_tbi": "F7AD80D930D3266E42DF62760BB4CA48", "File bfilt_overlap_peak": "504E92495AEB2A7383BCE85ACF634E4A" }, "input count": "D3D9446802A44259755D38E6D163E820", "backend name": "8C338F9EC0DF1BB3621803F2983039DA", "command template": "B64FD519BA0800AF98DB11528D527648", "input": { "File peak2": "7e16b4ac12c4afe624c8ece83ff994a4", "File blacklist": "4a58b07c400e8641dc3df79ad6fcf12c", "Int fraglen": "3644A684F98EA8FE223C713B77189A77", "File chrsz": "9b996e0d2eabf4f49ef31793c11d2f45", "String regex_bfilt_peak_chr_name": "C742E46BA2D35AFA2B5C0BBCB7D31CE7", "File peak_pooled": "44a9d024f599eaf7003cafd63a1a8574", "File ta": "2db8b57a38e09aa03ad502953b1adc50", "String prefix": "D3B9D7856D560D6435DB69A1320959A6", "String peak_type": "CCC3DB10222D85D8F7B6CE07DD9D8E2F", "File peak1": "b4ada56382dc83c1a00602df92c79ba5" } } }, "inputs": { "blacklist": "/reference/ENCODE/pipeline_genome_data/hg38/hg38.blacklist.bed.gz", "peak_type": "regionPeak", "ta": "/labs/dflev/siming/chipseq/encode_analysis_pipeline/chipseq_nano3_test5/chip/8b1e7e4e-e0a8-4328-9e2d-1b776b0b1b06/call-bam2ta/shard-0/execution/glob-199637d3015dccbe277f621a18be9eb4/0425-3-cetch1-flag_subsample1_R1.nodup.tagAlign.gz", "prefix": "rep1-pr1_vs_rep1-pr2", "peak2": "/labs/dflev/siming/chipseq/encode_analysis_pipeline/chipseq_nano3_test5/chip/8b1e7e4e-e0a8-4328-9e2d-1b776b0b1b06/call-call_peak_pr2/shard-0/execution/glob-3c17d82667de5da0b56a7c17d88175d2/0425-3-cetch1-flag_subsample1_R1.nodup.pr2_x_ctl_for_rep1.300K.regionPeak.gz", "fraglen": 200, "chrsz": "/reference/ENCODE/pipeline_genome_data/hg38/hg38.chrom.sizes", "peak1": "/labs/dflev/siming/chipseq/encode_analysis_pipeline/chipseq_nano3_test5/chip/8b1e7e4e-e0a8-4328-9e2d-1b776b0b1b06/call-call_peak_pr1/shard-0/execution/glob-3c17d82667de5da0b56a7c17d88175d2/0425-3-cetch1-flag_subsample1_R1.nodup.pr1_x_ctl_for_rep1.300K.regionPeak.gz", "regex_bfilt_peak_chr_name": "chr[\dXY]+", "peak_pooled": "/labs/dflev/siming/chipseq/encode_analysis_pipeline/chipseq_nano3_test5/chip/8b1e7e4e-e0a8-4328-9e2d-1b776b0b1b06/call-call_peak/shard-0/execution/glob-3c17d82667de5da0b56a7c17d88175d2/0425-3-cetch1-flag_subsample1_R1.nodup_x_ctl_for_rep1.300K.regionPeak.gz" }, "returnCode": 1, "failures": [ { "causedBy": [], "message": "Job chip.overlap_pr:0:2 exited with return code 1 which has not been declared as a valid return code. See 'continueOnReturnCode' runtime attribute for more details." } ], "jobId": "14894971", "backend": "slurm", "end": "2020-04-07T21:30:37.271Z", "stderr": "/labs/dflev/siming/chipseq/encode_analysis_pipeline/chipseq_nano3_test5/chip/8b1e7e4e-e0a8-4328-9e2d-1b776b0b1b06/call-overlap_pr/shard-0/attempt-2/execution/stderr", "callRoot": "/labs/dflev/siming/chipseq/encode_analysis_pipeline/chipseq_nano3_test5/chip/8b1e7e4e-e0a8-4328-9e2d-1b776b0b1b06/call-overlap_pr/shard-0/attempt-2", "attempt": 2, "executionEvents": [ { "startTime": "2020-04-07T21:30:06.893Z", "description": "RunningJob", "endTime": "2020-04-07T21:30:37.271Z" }, { "startTime": "2020-04-07T21:30:06.864Z", "description": "WaitingForValueStore", "endTime": "2020-04-07T21:30:06.876Z" }, { "startTime": "2020-04-07T21:30:37.271Z", "description": "UpdatingJobStore", "endTime": "2020-04-07T21:30:37.271Z" }, { "description": "RequestingExecutionToken", "endTime": "2020-04-07T21:30:06.864Z", "startTime": "2020-04-07T21:30:05.758Z" }, { "startTime": "2020-04-07T21:30:05.758Z", "endTime": "2020-04-07T21:30:05.758Z", "description": "Pending" }, { "startTime": "2020-04-07T21:30:06.876Z", "description": "PreparingJob", "endTime": "2020-04-07T21:30:06.893Z" } ], "start": "2020-04-07T21:30:05.757Z" } ],

leepc12 commented 4 years ago

Please upload a file instead of pasting the whole contents. Please upload Caper's log or find troubleshooting result at the bottom of the log.

simingzhang commented 4 years ago

By Caper log, do you mean the result of caper troubleshoot? I get an error with caper troubleshoot metadata.json: HTTPConnectionPool(host='localhost', port=8000): Max retries exceeded with url: /api/workflows/v1/query?additionalQueryResultFields=labels (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x2b19e058a7f0>: Failed to establish a new connection: [Errno 111] Connection refused')) Help: cannot connect to server. Check if server is dead or still spinning up.

The metadata file is attached.

metadata.txt

leepc12 commented 4 years ago

No, caper troubleshoot only works with caper server. I meant the shell log (STDOUT) of your caper run command.

simingzhang commented 4 years ago

Slurm output file attached:

slurm-14894815.txt

leepc12 commented 4 years ago
java.nio.file.NoSuchFileException: /oak/stanford/scg/lab_dflev/siming/chipseq/encode_analysis_pipeline/chipseq_nano3_test5/cromwell-workflow-logs/workflow.8b1e7e4e-e0a8-4328-9e2d-1b776b0b1b06.log

Found this in the log. Please don't run anything on /oak. Even though your output directory defined in ~/.caper/default.conf is not on /oak.

Run it on /scratch, $HOME or $PI_HOME instead.

Please upload this file. /labs/dflev/siming/chipseq/encode_analysis_pipeline/chipseq_nano3_test5/chip/8b1e7e4e-e0a8-4328-9e2d-1b776b0b1b06/call-idr_pr/shard-0/attempt-2/execution/stdout

simingzhang commented 4 years ago

Sorry, what is $PI_HOME? is that /labs/dflev? is that different from /oak/stanford/scg/lab_dflev? And how do I avoid running on oak? Currently, I'm moving to the output directory and running the sbatch from there. The output directory I used was: /labs/dflev/siming/chipseq/encode_analysis_pipeline/chipseq_nano3_test5

Here's the stdout file: stdout.txt

leepc12 commented 4 years ago

I thought that you are working on Sherlock. Even on SCG, running anything on OAK is not good. Check OAK's disk.

$ df -h /oak

Check ANY_DIRECTORY is on the same disk as /oak's.

$ df -h ANY_DIRECTORY

Use recommended scratch directory on SCG.

leepc12 commented 4 years ago

Please run this.

$ ls -l /labs/dflev/siming/chipseq/encode_analysis_pipeline/chipseq_nano3_test5/chip/8b1e7e4e-e0a8-4328-9e2d-1b776b0b1b06/call-idr_pr/shard-0/attempt-2/inputs/-723689832/hg38.blacklist.bed.gz

$ ls -l cd /labs/dflev/siming/chipseq/encode_analysis_pipeline/chipseq_nano3_test5/chip/8b1e7e4e-e0a8-4328-9e2d-1b776b0b1b06/call-idr_pr/shard-0/attempt-2/execution/
simingzhang commented 4 years ago

Result from first line: /labs/dflev/siming/chipseq/encode_analysis_pipeline/chipseq_nano3_test5/chip/8b1e7e4e-e0a8-4328-9e2d-1b776b0b1b06/call-idr_pr/shard-0/attempt-2/inputs/-723689832/hg38.blacklist.bed.gz -> /reference/ENCODE/pipeline_genome_data/hg38/hg38.blacklist.bed.gz

Result from second line (removed cd because it wasn't running): drwxrwsr-x 2 simingz scg_lab_dflev 4096 Apr 7 14:31 glob-156b027b56a0fc9230201b265d9f4bf4 -rw-rw-r-- 1 simingz scg_lab_dflev 0 Apr 7 14:31 glob-156b027b56a0fc9230201b265d9f4bf4.list drwxrwsr-x 2 simingz scg_lab_dflev 4096 Apr 7 14:31 glob-37a6259cc0c1dae299a7866489dff0bd -rw-rw-r-- 1 simingz scg_lab_dflev 5 Apr 7 14:31 glob-37a6259cc0c1dae299a7866489dff0bd.list drwxrwsr-x 2 simingz scg_lab_dflev 4096 Apr 7 14:31 glob-3c17d82667de5da0b56a7c17d88175d2 -rw-rw-r-- 1 simingz scg_lab_dflev 43 Apr 7 14:31 glob-3c17d82667de5da0b56a7c17d88175d2.list drwxrwsr-x 2 simingz scg_lab_dflev 4096 Apr 7 14:31 glob-6d5ecccaf6ecff88ff547bad03a90af6 -rw-rw-r-- 1 simingz scg_lab_dflev 0 Apr 7 14:31 glob-6d5ecccaf6ecff88ff547bad03a90af6.list drwxrwsr-x 2 simingz scg_lab_dflev 4096 Apr 7 14:31 glob-748df5c4b6fa48c407f98a93d42e64c6 -rw-rw-r-- 1 simingz scg_lab_dflev 33 Apr 7 14:31 glob-748df5c4b6fa48c407f98a93d42e64c6.list drwxrwsr-x 2 simingz scg_lab_dflev 4096 Apr 7 14:31 glob-8da83e7748d9e54f3e082eb4aa171757 -rw-rw-r-- 1 simingz scg_lab_dflev 57 Apr 7 14:31 glob-8da83e7748d9e54f3e082eb4aa171757.list drwxrwsr-x 2 simingz scg_lab_dflev 4096 Apr 7 14:31 glob-b34256f7f8497ceb77fad19f39663af0 -rw-rw-r-- 1 simingz scg_lab_dflev 56 Apr 7 14:31 glob-b34256f7f8497ceb77fad19f39663af0.list drwxrwsr-x 2 simingz scg_lab_dflev 4096 Apr 7 14:31 glob-c71d4cca8627aab27dc3503b7db7a39d -rw-rw-r-- 1 simingz scg_lab_dflev 0 Apr 7 14:31 glob-c71d4cca8627aab27dc3503b7db7a39d.list drwxrwsr-x 2 simingz scg_lab_dflev 4096 Apr 7 14:31 glob-dab68893df43168f2b301266b4498136 -rw-rw-r-- 1 simingz scg_lab_dflev 49 Apr 7 14:31 glob-dab68893df43168f2b301266b4498136.list -rw-rw-r-- 1 simingz scg_lab_dflev 901 Apr 7 14:30 hg38.blacklist.bed.tmp2 -rw-rw-r-- 2 simingz scg_lab_dflev 0 Apr 7 14:30 null -rw-rw-r-- 1 simingz scg_lab_dflev 2 Apr 7 14:30 rc -rw-rw-r-- 2 simingz scg_lab_dflev 20 Apr 7 14:30 rep1-pr1_vs_rep1-pr2.idr0.05.bfilt.regionPeak.gz -rw-rw-r-- 2 simingz scg_lab_dflev 197 Apr 7 14:30 rep1-pr1_vs_rep1-pr2.idr0.05.log -rw-rw-r-- 2 simingz scg_lab_dflev 350 Apr 7 14:30 rep1-pr1_vs_rep1-pr2.idr0.05.regionPeak.gz -rw-rw-r-- 1 simingz scg_lab_dflev 1101 Apr 7 14:30 rep1-pr1_vs_rep1-pr2.idr0.05.regionPeak.tmp1 -rw-rw-r-- 2 simingz scg_lab_dflev 1381 Apr 7 14:30 rep1-pr1_vs_rep1-pr2.idr0.05.unthresholded-peaks.txt.gz -rw-rw-r-- 2 simingz scg_lab_dflev 64019 Apr 7 14:30 rep1-pr1_vs_rep1-pr2.idr0.05.unthresholded-peaks.txt.png -rw-rw-r-- 1 simingz scg_lab_dflev 26350 Apr 7 14:30 script -rw-rw-r-- 1 simingz scg_lab_dflev 1593 Apr 7 14:30 script.submit -rw-rw-r-- 1 simingz scg_lab_dflev 1859 Apr 7 14:31 stderr -rw-rw-r-- 1 simingz scg_lab_dflev 0 Apr 7 14:30 stderr.submit -rw-rw-r-- 1 simingz scg_lab_dflev 6028 Apr 7 14:30 stdout -rw-rw-r-- 1 simingz scg_lab_dflev 29 Apr 7 14:30 stdout.submit

leepc12 commented 4 years ago
-rw-rw-r-- 2 simingz scg_lab_dflev 20 Apr 7 14:30 rep1-pr1_vs_rep1-pr2.idr0.05.bfilt.regionPeak.gz
-rw-rw-r-- 2 simingz scg_lab_dflev 197 Apr 7 14:30 rep1-pr1_vs_rep1-pr2.idr0.05.log
-rw-rw-r-- 2 simingz scg_lab_dflev 350 Apr 7 14:30 rep1-pr1_vs_rep1-pr2.idr0.05.regionPeak.gz

IDR peaks are not empty but after blacklist filtering it's empty. Something went wrong in the blacklist-filtering command line.

Try this.

$ cd /labs/dflev/siming/chipseq/encode_analysis_pipeline/chipseq_nano3_test5/chip/8b1e7e4e-e0a8-4328-9e2d-1b776b0b1b06/call-idr_pr/shard-0/attempt-2/execution/

$ bedtools --version

$ source activate encode-chip-seq-pipeline
$ bedtools --version

$ bedtools intersect -nonamecheck -v -a rep1-pr1_vs_rep1-pr2.idr0.05.regionPeak.tmp1 -b hg38.blacklist.bed.tmp2 | awk 'BEGIN{OFS="\t"} {if ($5>1000) $5=1000; print $0}' | grep -P 'chr[\dXY]+\b' | gzip -nc > rep1-pr1_vs_rep1-pr2.idr0.05.bfilt.regionPeak.gz

See what kind of error happens. Also check bedtools version.

simingzhang commented 4 years ago

I have no bedtools module loaded outside conda.

I ran: conda activate encode-chip-seq-pipeline bedtools --version

Result: bedtools v2.29.0

Running: source activate encode-chip-seq-pipeline activate: No such file or directory

(either with conda activate or without conda activate leads to same error):

Running: bedtools intersect -nonamecheck -v -a rep1-pr1_vs_rep1-pr2.idr0.05.regionPeak.tmp1 -b hg38.blacklist.bed.tmp2 | awk 'BEGIN{OFS="\t"} {if ($5>1000) $5=1000; print $0}' | grep -P 'chr[\dXY]+\b' | gzip -nc > rep1-pr1_vs_rep1-pr2.idr0.05.bfilt.regionPeak.gz

No error but the output file seems empty (attached)

rep1-pr1_vs_rep1-pr2.idr0.05.bfilt.regionPeak.gz

leepc12 commented 4 years ago

Can I see first few lines of rep1-pr1_vs_rep1-pr2.idr0.05.regionPeak.gz? You can run this on the execution directory.

$ zcat -f rep1-pr1_vs_rep1-pr2.idr0.05.bfilt.regionPeak.gz | head

Do your sample's chromosome names start with chr (e.g. chr1)?

simingzhang commented 4 years ago

I don't see any output after the command zcat -f rep1-pr1_vs_rep1-pr2.idr0.05.bfilt.regionPeak.gz | head. It looks empty.

The upstream file rep1-pr1_vs_rep1-pr2.idr0.05.regionPeak.tmp1 also looks really short, only has some chrM lines.

leepc12 commented 4 years ago

Oh... sorry. Try this one (without .bfilt.).

$ zcat -f rep1-pr1_vs_rep1-pr2.idr0.05.regionPeak.gz | head -n30
simingzhang commented 4 years ago

chrM 10030 10272 . 1000 . 164.85611 -1.00000 2.38917 142 chrM 16098 16338 . 1000 . 135.03608 -1.00000 2.38917 109 chrM 14010 14406 . 1000 . 127.83353 -1.00000 2.38917 207 chrM 5293 5469 . 1000 . 124.09686 -1.00000 2.38917 135 chrM 15452 15716 . 1000 . 121.85084 -1.00000 2.38917 107 chrM 12300 12477 . 1000 . 120.89155 -1.00000 2.38917 113 chrM 4943 5097 . 1000 . 116.57318 -1.00000 2.38917 49 chrM 10814 11017 . 1000 . 89.55153 -1.00000 2.38917 83 chrM 11951 12053 . 1000 . 79.15492 -1.00000 2.38917 55 chrM 8524 8780 . 1000 . 75.25766 -1.00000 2.38917 128 chrM 2084 2340 . 1000 . 70.60579 -1.00000 2.38917 128 chrM 329 585 . 1000 . 61.20675 -1.00000 2.38917 128 chrM 6416 6672 . 1000 . 51.26232 -1.00000 2.38917 128 chrM 4247 4333 . 1000 . 47.64323 -1.00000 2.38917 60 chrM 14512 14695 . 1000 . 46.91356 -1.00000 2.38917 110 chrM 13724 13857 . 1000 . 44.64217 -1.00000 2.38917 58 chrM 13039 13295 . 1000 . 37.65616 -1.00000 2.38917 128 chrM 3247 3503 . 1000 . 32.19942 -1.00000 2.38917 128 chrM 10957 11131 . 1000 . 28.63342 -1.00000 2.38917 150 chrM 3451 3707 . 1000 . 8.37660 -1.00000 2.38917 128

leepc12 commented 4 years ago

You don't have any reads but mitochondrial ones. Pipeline filters out all mitochondrial reads in .bfilt. peaks after blacklist-filtering, which resulted in an empty .bfilt. peak file.

If you want to keep all reads including mito ones, define this in your input JSON.

{
    "chip.regex_bfilt_peak_chr_name": "chr[\\dMXY]+"
}
leepc12 commented 4 years ago

Closing due to long inactivity.