louisgevirtzman commented 2 years ago

Describe the bug

When I start a run with this:

caper run /net/waterston/vol2/home/gevirl/chip-seq-pipeline2-2.1.2/chip.wdl -i /net/waterston/vol9/ChipSeqPipeline/arid-1_RW12194_L4larva_1/6/pipeline.json --conda

I get the following error:

2022-01-11 09:57:17,891|caper.cli|INFO| Cromwell stdout: /net/waterston/vol9/ChipSeqPipeline/arid-1_RW12194_L4larva_1/cromwell.out.1 2022-01-11 09:57:17,904|caper.caper_base|INFO| Creating a timestamped temporary directory. /net/waterston/vol9/capertmp/chip/20220111_095717_900159 2022-01-11 09:57:17,904|caper.caper_runner|INFO| Localizing files on work_dir. /net/waterston/vol9/capertmp/chip/20220111_095717_900159 2022-01-11 09:57:19,247|caper.caper_workflow_opts|INFO| Conda environment name not found in WDL metadata. wdl=/net/waterston/vol2/home/gevirl/chip-seq-pipeline2-2.1.2/chip.wdl 2022-01-11 09:57:19,254|caper.cromwell|INFO| Validating WDL/inputs/imports with Womtool... Traceback (most recent call last): File "/nfs/waterston/miniconda3/envs/py39/bin/caper", line 13, in main() File "/nfs/waterston/miniconda3/envs/py39/lib/python3.9/site-packages/caper/cli.py", line 705, in main return runner(parsed_args, nonblocking_server=nonblocking_server) File "/nfs/waterston/miniconda3/envs/py39/lib/python3.9/site-packages/caper/cli.py", line 249, in runner subcmd_run(c, args) File "/nfs/waterston/miniconda3/envs/py39/lib/python3.9/site-packages/caper/cli.py", line 379, in subcmd_run thread = caper_runner.run( File "/nfs/waterston/miniconda3/envs/py39/lib/python3.9/site-packages/caper/caper_runner.py", line 462, in run self._cromwell.validate(wdl=wdl, inputs=inputs, imports=imports) File "/nfs/waterston/miniconda3/envs/py39/lib/python3.9/site-packages/caper/cromwell.py", line 154, in validate raise WomtoolValidationFailed( caper.cromwell.WomtoolValidationFailed: RC=1 STDERR=WARNING: Unexpected input provided: chip.align_mem_mb (expected inputs: [chip.fastqs_rep3_R2, chip.align_ctl.trim_bp, chip.filter_disk_factor, chip.gensz, chip.trimmomatic_phred_score_format, chip.peaks_pr1, chip.ctl_nodup_bams, chip.ctl_depth_limit, chip.use_filt_pe_ta_for_xcor, chip.xcor_subsample_reads, chip.call_peak_time_hr, chip.fastqs_rep1_R1, chip.paired_ends, chip.align_R1.multimapping, chip.align.multimapping, chip.gc_bias_picard_java_heap, chip.fdr_thresh, chip.align_trimmomatic_java_heap, chip.align_bwa_mem_factor, chip.fastqs_rep9_R1, chip.ctl_depth_ratio, chip.filter_cpu, chip.xcor_exclusion_range_max, chip.pval_thresh, chip.fastqs_rep6_R2, chip.ctl_fastqs_rep4_R1, chip.fastqs_rep5_R2, chip.peak_pooled, chip.read_genome_tsv.null_s, chip.description, chip.ctl_paired_ends, chip.fastqs_rep4_R1, chip.macs2_signal_track_mem_factor, chip.fastqs_rep5_R1, chip.mapq_thresh, chip.ctl_fastqs_rep6_R1, chip.filter_R1.ref_fa, chip.macs2_signal_track_time_hr, chip.xcor_disk_factor, chip.ctl_fastqs_rep1_R2, chip.fastqs_rep7_R2, chip.filter_chrs, chip.ref_fa, chip.fastqs_rep6_R1, chip.ctl_fastqs_rep5_R2, chip.enable_jsd, chip.dup_marker, chip.call_peak_spp_disk_factor, chip.pool_ta.col, chip.docker, chip.use_bwa_mem_for_pe, chip.ctl_fastqs_rep2_R2, chip.fastqs_rep8_R2, chip.macs2_signal_track_disk_factor, chip.filter_time_hr, chip.peaks, chip.filter_no_dedup.ref_fa, chip.xcor_cpu, chip.call_peak_macs2_mem_factor, chip.peak_ppr1, chip.align_bowtie2_disk_factor, chip.call_peak_cpu, chip.enable_gc_bias, chip.ctl_fastqs_rep3_R1, chip.conda_macs2, chip.nodup_bams, chip.ctl_fastqs_rep6_R2, chip.ctl_fastqs_rep1_R1, chip.use_bowtie2_local_mode, chip.fastqs_rep10_R2, chip.ctl_paired_end, chip.pool_blacklist.prefix, chip.true_rep_only, chip.ctl_subsample_reads, chip.ctl_fastqs_rep8_R2, chip.align_R1.trimmomatic_java_heap, chip.subsample_ctl_mem_factor, chip.ctl_fastqs_rep7_R1, chip.spr_mem_factor, chip.ctl_fastqs_rep5_R1, chip.bam2ta_time_hr, chip.fastqs_rep2_R1, chip.pool_ta_pr1.col, chip.ctl_bams, chip.subsample_reads, chip.align_bowtie2_mem_factor, chip.aligner, chip.blacklist, chip.title, chip.bowtie2_idx_tar, chip.ctl_fastqs_rep2_R1, chip.singularity, chip.align.trim_bp, chip.align_only, chip.align_time_hr, chip.exp_ctl_depth_ratio_limit, chip.bam2ta_cpu, chip.ctl_fastqs_rep9_R1, chip.enable_count_signal_track, chip.call_peak_spp_mem_factor, chip.no_dup_removal, chip.paired_end, chip.chrsz, chip.jsd_mem_factor, chip.ctl_fastqs_rep10_R2, chip.qc_report.qc_json_ref, chip.xcor_trim_bp, chip.bwa_idx_tar, chip.conda, chip.fastqs_rep4_R2, chip.peak_caller, chip.peak_ppr2, chip.fastqs_rep2_R2, chip.ctl_fastqs_rep7_R2, chip.fastqs_rep10_R1, chip.ctl_fastqs_rep3_R2, chip.jsd_disk_factor, chip.fastqs_rep8_R1, chip.align_ctl.multimapping, chip.call_peak_macs2_disk_factor, chip.fraglen, chip.jsd_time_hr, chip.crop_length, chip.conda_spp, chip.genome_name, chip.fastqs_rep7_R1, chip.mito_chr_name, chip.cap_num_peak, chip.always_use_pooled_ctl, chip.ctl_fastqs_rep9_R2, chip.ctl_tas, chip.blacklist2, chip.align_cpu, chip.bwa_mem_read_len_limit, chip.custom_aligner_idx_tar, chip.tas, chip.pseudoreplication_random_seed, chip.fastqs_rep1_R2, chip.fastqs_rep3_R1, chip.filter_picard_java_heap, chip.filter_mem_factor, chip.regex_bfilt_peak_chr_name, chip.spr_disk_factor, chip.crop_length_tol, chip.genome_tsv, chip.pool_ta_pr2.col, chip.bams, chip.xcor_mem_factor, chip.ctl_fastqs_rep10_R1, chip.ctl_fastqs_rep4_R2, chip.fastqs_rep9_R2, chip.pipeline_type, chip.peaks_pr2, chip.align_bwa_disk_factor, chip.jsd_cpu, chip.bam2ta_disk_factor, chip.subsample_ctl_disk_factor, chip.custom_align_py, chip.redact_nodup_bam, chip.xcor_time_hr, chip.bam2ta_mem_factor, chip.ctl_fastqs_rep8_R1, chip.pool_ta_ctl.col, chip.xcor_exclusion_range_min, chip.idr_thresh]) WARNING: Unexpected input provided: chip.cap_num_peak_spp (expected inputs: [chip.fastqs_rep3_R2, chip.align_ctl.trim_bp, chip.filter_disk_factor, chip.gensz, chip.trimmomatic_phred_score_format, chip.peaks_pr1, chip.ctl_nodup_bams, chip.ctl_depth_limit, chip.use_filt_pe_ta_for_xcor, chip.xcor_subsample_reads, chip.call_peak_time_hr, chip.fastqs_rep1_R1, chip.paired_ends, chip.align_R1.multimapping, chip.align.multimapping, chip.gc_bias_picard_java_heap, chip.fdr_thresh, chip.align_trimmomatic_java_heap, chip.align_bwa_mem_factor, chip.fastqs_rep9_R1, chip.ctl_depth_ratio, chip.filter_cpu, chip.xcor_exclusion_range_max, chip.pval_thresh, chip.fastqs_rep6_R2, chip.ctl_fastqs_rep4_R1, chip.fastqs_rep5_R2, chip.peak_pooled, chip.read_genome_tsv.null_s, chip.description, chip.ctl_paired_ends, chip.fastqs_rep4_R1, chip.macs2_signal_track_mem_factor, chip.fastqs_rep5_R1, chip.mapq_thresh, chip.ctl_fastqs_rep6_R1, chip.filter_R1.ref_fa, chip.macs2_signal_track_time_hr, chip.xcor_disk_factor, chip.ctl_fastqs_rep1_R2, chip.fastqs_rep7_R2, chip.filter_chrs, chip.ref_fa, chip.fastqs_rep6_R1, chip.ctl_fastqs_rep5_R2, chip.enable_jsd, chip.dup_marker, chip.call_peak_spp_disk_factor, chip.pool_ta.col, chip.docker, chip.use_bwa_mem_for_pe, chip.ctl_fastqs_rep2_R2, chip.fastqs_rep8_R2, chip.macs2_signal_track_disk_factor, chip.filter_time_hr, chip.peaks, chip.filter_no_dedup.ref_fa, chip.xcor_cpu, chip.call_peak_macs2_mem_factor, chip.peak_ppr1, chip.align_bowtie2_disk_factor, chip.call_peak_cpu, chip.enable_gc_bias, chip.ctl_fastqs_rep3_R1, chip.conda_macs2, chip.nodup_bams, chip.ctl_fastqs_rep6_R2, chip.ctl_fastqs_rep1_R1, chip.use_bowtie2_local_mode, chip.fastqs_rep10_R2, chip.ctl_paired_end, chip.pool_blacklist.prefix, chip.true_rep_only, chip.ctl_subsample_reads, chip.ctl_fastqs_rep8_R2, chip.align_R1.trimmomatic_java_heap, chip.subsample_ctl_mem_factor, chip.ctl_fastqs_rep7_R1, chip.spr_mem_factor, chip.ctl_fastqs_rep5_R1, chip.bam2ta_time_hr, chip.fastqs_rep2_R1, chip.pool_ta_pr1.col, chip.ctl_bams, chip.subsample_reads, chip.align_bowtie2_mem_factor, chip.aligner, chip.blacklist, chip.title, chip.bowtie2_idx_tar, chip.ctl_fastqs_rep2_R1, chip.singularity, chip.align.trim_bp, chip.align_only, chip.align_time_hr, chip.exp_ctl_depth_ratio_limit, chip.bam2ta_cpu, chip.ctl_fastqs_rep9_R1, chip.enable_count_signal_track, chip.call_peak_spp_mem_factor, chip.no_dup_removal, chip.paired_end, chip.chrsz, chip.jsd_mem_factor, chip.ctl_fastqs_rep10_R2, chip.qc_report.qc_json_ref, chip.xcor_trim_bp, chip.bwa_idx_tar, chip.conda, chip.fastqs_rep4_R2, chip.peak_caller, chip.peak_ppr2, chip.fastqs_rep2_R2, chip.ctl_fastqs_rep7_R2, chip.fastqs_rep10_R1, chip.ctl_fastqs_rep3_R2, chip.jsd_disk_factor, chip.fastqs_rep8_R1, chip.align_ctl.multimapping, chip.call_peak_macs2_disk_factor, chip.fraglen, chip.jsd_time_hr, chip.crop_length, chip.conda_spp, chip.genome_name, chip.fastqs_rep7_R1, chip.mito_chr_name, chip.cap_num_peak, chip.always_use_pooled_ctl, chip.ctl_fastqs_rep9_R2, chip.ctl_tas, chip.blacklist2, chip.align_cpu, chip.bwa_mem_read_len_limit, chip.custom_aligner_idx_tar, chip.tas, chip.pseudoreplication_random_seed, chip.fastqs_rep1_R2, chip.fastqs_rep3_R1, chip.filter_picard_java_heap, chip.filter_mem_factor, chip.regex_bfilt_peak_chr_name, chip.spr_disk_factor, chip.crop_length_tol, chip.genome_tsv, chip.pool_ta_pr2.col, chip.bams, chip.xcor_mem_factor, chip.ctl_fastqs_rep10_R1, chip.ctl_fastqs_rep4_R2, chip.fastqs_rep9_R2, chip.pipeline_type, chip.peaks_pr2, chip.align_bwa_disk_factor, chip.jsd_cpu, chip.bam2ta_disk_factor, chip.subsample_ctl_disk_factor, chip.custom_align_py, chip.redact_nodup_bam, chip.xcor_time_hr, chip.bam2ta_mem_factor, chip.ctl_fastqs_rep8_R1, chip.pool_ta_ctl.col, chip.xcor_exclusion_range_min, chip.idr_thresh]) WARNING: Unexpected input provided: chip.call_peak_mem_mb (expected inputs: [chip.fastqs_rep3_R2, chip.align_ctl.trim_bp, chip.filter_disk_factor, chip.gensz, chip.trimmomatic_phred_score_format, chip.peaks_pr1, chip.ctl_nodup_bams, chip.ctl_depth_limit, chip.use_filt_pe_ta_for_xcor, chip.xcor_subsample_reads, chip.call_peak_time_hr, chip.fastqs_rep1_R1, chip.paired_ends, chip.align_R1.multimapping, chip.align.multimapping, chip.gc_bias_picard_java_heap, chip.fdr_thresh, chip.align_trimmomatic_java_heap, chip.align_bwa_mem_factor, chip.fastqs_rep9_R1, chip.ctl_depth_ratio, chip.filter_cpu, chip.xcor_exclusion_range_max, chip.pval_thresh, chip.fastqs_rep6_R2, chip.ctl_fastqs_rep4_R1, chip.fastqs_rep5_R2, chip.peak_pooled, chip.read_genome_tsv.null_s, chip.description, chip.ctl_paired_ends, chip.fastqs_rep4_R1, chip.macs2_signal_track_mem_factor, chip.fastqs_rep5_R1, chip.mapq_thresh, chip.ctl_fastqs_rep6_R1, chip.filter_R1.ref_fa, chip.macs2_signal_track_time_hr, chip.xcor_disk_factor, chip.ctl_fastqs_rep1_R2, chip.fastqs_rep7_R2, chip.filter_chrs, chip.ref_fa, chip.fastqs_rep6_R1, chip.ctl_fastqs_rep5_R2, chip.enable_jsd, chip.dup_marker, chip.call_peak_spp_disk_factor, chip.pool_ta.col, chip.docker, chip.use_bwa_mem_for_pe, chip.ctl_fastqs_rep2_R2, chip.fastqs_rep8_R2, chip.macs2_signal_track_disk_factor, chip.filter_time_hr, chip.peaks, chip.filter_no_dedup.ref_fa, chip.xcor_cpu, chip.call_peak_macs2_mem_factor, chip.peak_ppr1, chip.align_bowtie2_disk_factor, chip.call_peak_cpu, chip.enable_gc_bias, chip.ctl_fastqs_rep3_R1, chip.conda_macs2, chip.nodup_bams, chip.ctl_fastqs_rep6_R2, chip.ctl_fastqs_rep1_R1, chip.use_bowtie2_local_mode, chip.fastqs_rep10_R2, chip.ctl_paired_end, chip.pool_blacklist.prefix, chip.true_rep_only, chip.ctl_subsample_reads, chip.ctl_fastqs_rep8_R2, chip.align_R1.trimmomatic_java_heap, chip.subsample_ctl_mem_factor, chip.ctl_fastqs_rep7_R1, chip.spr_mem_factor, chip.ctl_fastqs_rep5_R1, chip.bam2ta_time_hr, chip.fastqs_rep2_R1, chip.pool_ta_pr1.col, chip.ctl_bams, chip.subsample_reads, chip.align_bowtie2_mem_factor, chip.aligner, chip.blacklist, chip.title, chip.bowtie2_idx_tar, chip.ctl_fastqs_rep2_R1, chip.singularity, chip.align.trim_bp, chip.align_only, chip.align_time_hr, chip.exp_ctl_depth_ratio_limit, chip.bam2ta_cpu, chip.ctl_fastqs_rep9_R1, chip.enable_count_signal_track, chip.call_peak_spp_mem_factor, chip.no_dup_removal, chip.paired_end, chip.chrsz, chip.jsd_mem_factor, chip.ctl_fastqs_rep10_R2, chip.qc_report.qc_json_ref, chip.xcor_trim_bp, chip.bwa_idx_tar, chip.conda, chip.fastqs_rep4_R2, chip.peak_caller, chip.peak_ppr2, chip.fastqs_rep2_R2, chip.ctl_fastqs_rep7_R2, chip.fastqs_rep10_R1, chip.ctl_fastqs_rep3_R2, chip.jsd_disk_factor, chip.fastqs_rep8_R1, chip.align_ctl.multimapping, chip.call_peak_macs2_disk_factor, chip.fraglen, chip.jsd_time_hr, chip.crop_length, chip.conda_spp, chip.genome_name, chip.fastqs_rep7_R1, chip.mito_chr_name, chip.cap_num_peak, chip.always_use_pooled_ctl, chip.ctl_fastqs_rep9_R2, chip.ctl_tas, chip.blacklist2, chip.align_cpu, chip.bwa_mem_read_len_limit, chip.custom_aligner_idx_tar, chip.tas, chip.pseudoreplication_random_seed, chip.fastqs_rep1_R2, chip.fastqs_rep3_R1, chip.filter_picard_java_heap, chip.filter_mem_factor, chip.regex_bfilt_peak_chr_name, chip.spr_disk_factor, chip.crop_length_tol, chip.genome_tsv, chip.pool_ta_pr2.col, chip.bams, chip.xcor_mem_factor, chip.ctl_fastqs_rep10_R1, chip.ctl_fastqs_rep4_R2, chip.fastqs_rep9_R2, chip.pipeline_type, chip.peaks_pr2, chip.align_bwa_disk_factor, chip.jsd_cpu, chip.bam2ta_disk_factor, chip.subsample_ctl_disk_factor, chip.custom_align_py, chip.redact_nodup_bam, chip.xcor_time_hr, chip.bam2ta_mem_factor, chip.ctl_fastqs_rep8_R1, chip.pool_ta_ctl.col, chip.xcor_exclusion_range_min, chip.idr_thresh])

OS/Platform

OS/Platform: Centos 7
Conda version: 4.8.3
Pipeline version: [e.g. v1.6.0]
Caper version: 2.1.2

Caper configuration file

backend=sge

Parallel environement is required, ask your administrator to create one

If your cluster doesn't support PE then edit 'sge-resource-param'

to fit your cluster's configuration.

sge-pe=serial

This parameter is NOT for 'caper submit' BUT for 'caper run' and 'caper server' only.

This resource parameter string will be passed to sbatch, qsub, bsub, ...

You can customize it according to your cluster's configuration.

Note that Cromwell's implicit type conversion (String to Integer)

seems to be buggy for WomLong type memory variables (memory_mb and memory_gb).

So be careful about using the + operator between WomLong and other types (String, even Int).

For example, ${"--mem=" + memory_mb} will not work since memory_mb is WomLong.

Use ${"if defined(memory_mb) then "--mem=" else ""}{memory_mb}${"if defined(memory_mb) then "mb " else " "}

See https://github.com/broadinstitute/cromwell/issues/4659 for details

Cromwell's built-in variables (attributes defined in WDL task's runtime)

Use them within ${} notation.

- cpu: number of cores for a job (default = 1)

- memory_mb, memory_gb: total memory for a job in MB, GB

* these are converted from 'memory' string attribute (including size unit)

defined in WDL task's runtime

- time: time limit for a job in hour

- gpu: specified gpu name or number of gpus (it's declared as String)

Parallel environment of SGE:

Find one with `$ qconf -spl` or ask you admin to add one if not exists.

If your cluster works without PE then edit the below sge-resource-param

sge-pe=serial

sge-resource-param=${if cpu > 1 then "-pe " + sge_pe + " " else ""} ${if cpu > 1 then cpu else ""} ${true="-l h_vmem=$(expr " false="" defined(memory_mb)}${memory_mb}${true=" / " false="" defined(memory_mb)}${if defined(memory_mb) then cpu else ""}${true=")m" false="" defined(memory_mb)} ${true="-l s_vmem=$(expr " false="" defined(memory_mb)}${memory_mb}$#{true=" / " false="" defined(memory_mb)}${if defined(memory_mb) then cpu else ""}${true=")m" false="" defined(memory_mb)} ${"-l h_rt=" + time + ":00:00"} ${"-l s_rt=" + time + ":00:00"} ${"-l gpu=" + gpu}

If needed uncomment and define any extra SGE qsub parameters here

YOU CANNOT USE WDL SYNTAX AND CROMWELL BUILT-IN VARIABLES HERE

sge-extra-param=

Hashing strategy for call-caching (3 choices)

This parameter is for local (local/slurm/sge/pbs/lsf) backend only.

This is important for call-caching,

which means re-using outputs from previous/failed workflows.

Cache will miss if different strategy is used.

"file" method has been default for all old versions of Caper<1.0.

"path+modtime" is a new default for Caper>=1.0,

file: use md5sum hash (slow).

path: use path.

path+modtime: use path and modification time.

local-hash-strat=path+modtime

Metadata DB for call-caching (reusing previous outputs):

Cromwell supports restarting workflows based on a metadata DB

DB is in-memory by default

db=in-memory

If you use 'caper server' then you can use one unified '--file-db'

for all submitted workflows. In such case, uncomment the following two lines

and defined file-db as an absolute path to store metadata of all workflows

db=file

file-db=

If you use 'caper run' and want to use call-caching:

Make sure to define different 'caper run ... --db file --file-db DB_PATH'

for each pipeline run.

But if you want to restart then define the same '--db file --file-db DB_PATH'

then Caper will collect/re-use previous outputs without running the same task again

Previous outputs will be simply hard/soft-linked.

Local directory for localized files and Cromwell's intermediate files

If not defined, Caper will make .caper_tmp/ on local-out-dir or CWD.

/tmp is not recommended here since Caper store all localized data files

on this directory (e.g. input FASTQs defined as URLs in input JSON).

local-loc-dir=/net/waterston/vol9/capertmp

cromwell=/net/waterston/vol2/home/gevirl/.caper/cromwell_jar/cromwell-65.jar womtool=/net/waterston/vol2/home/gevirl/.caper/womtool_jar/womtool-65.jar

Input JSON file

{"chip.title":"arid-1_RW12194_L4larva_1_6","chip.description":"gevirl","chip.always_use_pooled_ctl":false,"chip.true_rep_only":false,"chip.enable_count_signal_track":true,"chip.aligner":"bwa","chip.use_bwa_mem_for_pe":true,"chip.align_only":false,"chip.genome_tsv":"/net/waterston/vol9/WS245chr/WS245chr.tsv","chip.peak_caller":"spp","chip.pipeline_type":"tf","chip.cap_num_peak_spp":300000,"chip.idr_thresh":0.01,"chip.call_peak_mem_mb":16000,"chip.align_mem_mb":20000,"chip.filter_picard_java_heap":"4G","chip.fastqs_rep1_R1":["/net/waterston/vol9/ChipSeqPipeline/arid-1_RW12194_L4larva_1/ARDIP3_240_337_S33_L001_R1_001.fastq.gz"],"chip.fastqs_rep1_R2":["/net/waterston/vol9/ChipSeqPipeline/arid-1_RW12194_L4larva_1/ARDIP3_240_337_S33_L001_R2_001.fastq.gz"],"chip.fastqs_rep2_R1":["/net/waterston/vol9/ChipSeqPipeline/arid-1_RW12194_L4larva_1/ARDIP4_228_349_S34_L001_R1_001.fastq.gz"],"chip.fastqs_rep2_R2":["/net/waterston/vol9/ChipSeqPipeline/arid-1_RW12194_L4larva_1/ARDIP4_228_349_S34_L001_R2_001.fastq.gz"],"chip.paired_ends":[true,true],"chip.ctl_fastqs_rep1_R1":["/net/waterston/vol9/ChipSeqPipeline/arid-1_RW12194_L4larva_1/ARDinp3_263_314_S39_L001_R1_001.fastq.gz"],"chip.ctl_fastqs_rep1_R2":["/net/waterston/vol9/ChipSeqPipeline/arid-1_RW12194_L4larva_1/ARDinp3_263_314_S39_L001_R2_001.fastq.gz"],"chip.ctl_paired_ends":[true]}

Troubleshooting result

If you ran caper run without Caper server then Caper automatically runs a troubleshooter for failed workflows. Find troubleshooting result in the bottom of Caper's screen log.

If you ran caper submit with a running Caper server then first find your workflow ID (1st column) with caper list and run caper debug [WORKFLOW_ID].

Paste troubleshooting result.

PASTE TROUBLESHOOTING RESULT HERE

leepc12 commented 2 years ago

STDERR=WARNING: Unexpected input provided: chip.align_mem_mb

I think you are using an outdated Conda environment and input JSON. This parameter chip.align_mem_mb has been deprecated (pipeline automatically determines job's memory according to input file sizes) and no longer exists.

Please get the latest pipeline + caper and reinstall pipeline's Conda environment (DO NOT ACTIVATE PIPELINE's CONDA ENV BEFORE RUNNING A PIPELINE).

$ pip install caper --upgrade
# and then git pull (or git clone from scratch) the pipeline git directory to update it
$ scripts/uninstall_conda_env.sh
$ scripts/install_conda_env.sh

Remove alll *_mem_mb parameters from your input JSON and try again.

louisgevirtzman commented 2 years ago

I have update Conda and removed alll *_mem_mb from the input JSON The alignment tasks are failing because of inadequate memory request. I cannot figure out how to request more memory for this task. Do I edit something in the wdl file or the caper config file? Please provide a simple example.

leepc12 commented 2 years ago

Pipeline automatically scales memory according to the size of each task's inputs. Please upload cromwell.out* files for debugging.

louisgevirtzman commented 2 years ago

cromwell.out.10.gz

The alignment tasks die. I The sge kills them. Here is a typical report of the sge job

I don't know what code 100 means failed 100 : assumedly after job

(base) [gevirl@grid-head2 execution]$ qacct -j 282137901

qname sage-login.q
hostname sage012.grid.gs.washington.edu group waterstonlab
owner gevirl
project sage
department waterstonlab
jobname cromwell_cd5ab92a_align_R1 jobnumber 282137901
taskid undefined pe_taskid NONE
account sge
priority 0
cwd /net/waterston/vol9/ChipSeqPipeline/arid-1_RW12194_L4larva_1/chip/cd5ab92a-246c-45fd-b6f9-64f5b09114b8/call-align_R1/shard-0 submit_host sage013.grid.gs.washington.edu submit_cmd qsub -V -terse -S /bin/bash -N cromwell_cd5ab92a_align_R1 -wd /net/waterston/vol9/ChipSeqPipeline/arid-1_RW12194_L4larva_1/chip/cd5ab92a-246c-45fd-b6f9-64f5b09114b8/call-align_R1/shard-0 -o /net/waterston/vol9/ChipSeqPipeline/arid-1_RW12194_L4larva_1/chip/cd5ab92a-246c-45fd-b6f9-64f5b09114b8/call-align_R1/shard-0/execution/stdout -e /net/waterston/vol9/ChipSeqPipeline/arid-1_RW12194_L4larva_1/chip/cd5ab92a-246c-45fd-b6f9-64f5b09114b8/call-align_R1/shard-0/execution/stderr -pe serial 6 -l h_vmem=936m -l s_vmem=936m -l h_rt=48:00:00 -l s_rt=48:00:00 -P sage /net/waterston/vol9/ChipSeqPipeline/arid-1_RW12194_L4larva_1/chip/cd5ab92a-246c-45fd-b6f9-64f5b09114b8/call-align_R1/shard-0/execution/script.caper qsub_time 01/20/2022 11:32:51.622 start_time 01/20/2022 11:33:21.751 end_time 01/20/2022 11:44:29.509 granted_pe serial
slots 6
failed 100 : assumedly after job deleted_by NONE exit_status 152
ru_wallclock 667.758
ru_utime 0.611
ru_stime 0.527
ru_maxrss 13712
ru_ixrss 0
ru_ismrss 0
ru_idrss 0
ru_isrss 0
ru_minflt 75212
ru_majflt 24
ru_nswap 0
ru_inblock 42496
ru_oublock 832
ru_msgsnd 0
ru_msgrcv 0
ru_nsignals 0
ru_nvcsw 1918
ru_nivcsw 279
wallclock 668.176
cpu 1684.140
mem 709.450
io 49.326
iow 2.700
ioops 3671821
maxvmem 5.394G maxrss 1.545G maxpss 1.536G arid undefined jc_name NONE bound_cores sage012.grid.gs.washington.edu=0,2, sage012.grid.gs.washington.edu=0,3, sage012.grid.gs.washington.edu=1,0, sage012.grid.gs.washington.edu=1,1, sage012.grid.gs.washington.edu=1,2, sage012.grid.gs.washington.edu=1,3

louisgevirtzman commented 2 years ago

exit_status 152 in the sge report in previous comment may mean a memory limit has been reached I would like to experiment by increasing the requested memory for the alignment tasks. Please guide me on how to do that.

leepc12 commented 2 years ago

Let's separate the job script itself from Caper. Please run this script on your login node. This is the exact command that Caper submitted. Please let me know what kind of errors you get for the job cromwell_cd5ab92a_align. Please qacct on it. Also please check if you can make this command work by modifying resource parameters.

qsub -V -terse -S /bin/bash -N cromwell_cd5ab92a_align -wd /net/waterston/vol9/ChipSeqPipeline/arid-1_RW12194_L4larva_1/chip/cd5ab92a-246c-45fd-b6f9-64f5b09114b8/call-align/shard-1 -o /net/waterston/vol9/ChipSeqPipeline/arid-1_RW12194_L4larva_1/chip/cd5ab92a-246c-45fd-b6f9-64f5b09114b8/call-align/shard-1/execution/stdout -e /net/waterston/vol9/ChipSeqPipeline/arid-1_RW12194_L4larva_1/chip/cd5ab92a-246c-45fd-b6f9-64f5b09114b8/call-align/shard-1/execution/stderr \
    \
   -pe serial  6 -l h_vmem=$(expr 6064 / 6)m -l s_vmem=$(expr 6064 / 6)m -l h_rt=48:00:00 -l s_rt=48:00:00   \
   -P sage \
   /net/waterston/vol9/ChipSeqPipeline/arid-1_RW12194_L4larva_1/chip/cd5ab92a-246c-45fd-b6f9-64f5b09114b8/call-align/shard-1/execution/script.caper

Resource parameters:

   -pe serial  6 -l h_vmem=$(expr 6064 / 6)m -l s_vmem=$(expr 6064 / 6)m -l h_rt=48:00:00 -l s_rt=48:00:00   \
   -P sage \

louisgevirtzman commented 2 years ago

I ran the original job script and got the same error. The script and the qacct results are in the attached file original.txt

I increased the memory and the job ran to completion without error. The modified script and the qacct results are in the attached file moreMemory.txt

please advise how I should increase memory specification for particular tasks

moreMemory.txt original.txt

leepc12 commented 2 years ago

You can increase the memory factor, which is a multiplier for the total memory for each task. Find default value for this parameter chip.align_mem_factor from pipeline's input JSON documentation. And try doubling it.

ENCODE-DCC / chip-seq-pipeline2

caper.caper_workflow_opts|INFO| Conda environment name not found in WDL metadata. wdl=/net/waterston/vol2/home/gevirl/chip-seq-pipeline2-2.1.2/chip.wdl #257

Describe the bug

OS/Platform

Caper configuration file

Parallel environement is required, ask your administrator to create one

If your cluster doesn't support PE then edit 'sge-resource-param'

to fit your cluster's configuration.

This parameter is NOT for 'caper submit' BUT for 'caper run' and 'caper server' only.

This resource parameter string will be passed to sbatch, qsub, bsub, ...

You can customize it according to your cluster's configuration.

Note that Cromwell's implicit type conversion (String to Integer)

seems to be buggy for WomLong type memory variables (memory_mb and memory_gb).

So be careful about using the + operator between WomLong and other types (String, even Int).

For example, ${"--mem=" + memory_mb} will not work since memory_mb is WomLong.

Use ${"if defined(memory_mb) then "--mem=" else ""}{memory_mb}${"if defined(memory_mb) then "mb " else " "}

See https://github.com/broadinstitute/cromwell/issues/4659 for details

Cromwell's built-in variables (attributes defined in WDL task's runtime)

Use them within ${} notation.

- cpu: number of cores for a job (default = 1)

- memory_mb, memory_gb: total memory for a job in MB, GB

* these are converted from 'memory' string attribute (including size unit)

defined in WDL task's runtime

- time: time limit for a job in hour

- gpu: specified gpu name or number of gpus (it's declared as String)

Parallel environment of SGE:

Find one with $ qconf -spl or ask you admin to add one if not exists.

If your cluster works without PE then edit the below sge-resource-param

sge-pe=serial

If needed uncomment and define any extra SGE qsub parameters here

YOU CANNOT USE WDL SYNTAX AND CROMWELL BUILT-IN VARIABLES HERE

sge-extra-param=

Hashing strategy for call-caching (3 choices)

This parameter is for local (local/slurm/sge/pbs/lsf) backend only.

This is important for call-caching,

which means re-using outputs from previous/failed workflows.

Cache will miss if different strategy is used.

"file" method has been default for all old versions of Caper<1.0.

"path+modtime" is a new default for Caper>=1.0,

file: use md5sum hash (slow).

path: use path.

path+modtime: use path and modification time.

Metadata DB for call-caching (reusing previous outputs):

Cromwell supports restarting workflows based on a metadata DB

DB is in-memory by default

db=in-memory

If you use 'caper server' then you can use one unified '--file-db'

for all submitted workflows. In such case, uncomment the following two lines

and defined file-db as an absolute path to store metadata of all workflows

db=file

file-db=

If you use 'caper run' and want to use call-caching:

Make sure to define different 'caper run ... --db file --file-db DB_PATH'

for each pipeline run.

But if you want to restart then define the same '--db file --file-db DB_PATH'

then Caper will collect/re-use previous outputs without running the same task again

Previous outputs will be simply hard/soft-linked.

Local directory for localized files and Cromwell's intermediate files

If not defined, Caper will make .caper_tmp/ on local-out-dir or CWD.

/tmp is not recommended here since Caper store all localized data files

on this directory (e.g. input FASTQs defined as URLs in input JSON).

Input JSON file

Troubleshooting result

(base) [gevirl@grid-head2 execution]$ qacct -j 282137901

Find one with `$ qconf -spl` or ask you admin to add one if not exists.