ENCODE-DCC / chip-seq-pipeline2

ENCODE ChIP-seq pipeline
MIT License
241 stars 123 forks source link

call-xcor fails before peak calling #261

Closed wx2022 closed 2 years ago

wx2022 commented 2 years ago

Describe the bug

The pipeline fails when doing cross correlation analysis. Please see the errors below:

==== NAME=chip.xcor, STATUS=RetryableFailure, PARENT= SHARD_IDX=1, RC=1, JOB_ID=402477 START=2022-02-04T03:58:57.092Z, END=2022-02-04T03:59:35.114Z STDOUT=/work/wx74/chip-seq-pipeline2-results/OP_1/test/chip/15cee7a3-e687-47fb-aaf2-8e022481ee9a/call-xcor/shard-1/execution/stdout STDERR=/work/wx74/chip-seq-pipeline2-results/OP_1/test/chip/15cee7a3-e687-47fb-aaf2-8e022481ee9a/call-xcor/shard-1/execution/stderr STDERR_CONTENTS= Traceback (most recent call last): File "/work/wx74/.conda/envs/encode-chip-seq-pipeline-spp/bin/encode_task_xcor.py", line 156, in main() File "/work/wx74/.conda/envs/encode-chip-seq-pipeline-spp/bin/encode_task_xcor.py", line 144, in main args.chip_seq_type, args.exclusion_range_min, args.exclusion_range_max) File "/work/wx74/.conda/envs/encode-chip-seq-pipeline-spp/bin/encode_task_xcor.py", line 105, in xcor run_shell_cmd(cmd1) File "/work/wx74/.conda/envs/encode-chip-seq-pipeline-spp/bin/encode_lib_common.py", line 359, in run_shell_cmd raise Exception(err_str) Exception: PID=193447, PGID=193447, RC=1, DURATION_SEC=20.2 STDERR=Loading required package: caTools Error in pdf(file = iparams$output.plot.file, width = 5, height = 5) : invalid 'file' argument 'H3K4me1%OP9_TGCGCAAT_1.trim_50bp.srt.filt.no_chrM.15M.cc.plot.pdf' Execution halted

OS/Platform

Caper configuration file

Paste contents of ~/.caper/default.conf.

backend=slurm
slurm-partition=common
slurm-resource-param=-n 1 --ntasks-per-node=1 --cpus-per-task=${cpu} ${if defined(memory_mb) then "--mem=" else ""}${memory_mb}${if defined(memory_mb) then "M" else ""} ${if defined(time) then "--time=" else ""}${time*60} ${if defined(gpu) then "--gres=gpu:" else ""}${gpu} 

local-hash-strat=path+modtime
local-loc-dir=/work/wx74/chip-seq-pipeline2-results/OP_1/test2
cromwell=/hpc/home/wx74/.caper/cromwell_jar/cromwell-65.jar
womtool=/hpc/home/wx74/.caper/womtool_jar/womtool-65.jar

Input JSON file

Paste contents of your input JSON file.


{
    "chip.title" : "H3K4me1_1_test",
    "chip.description" : "H3K4me1_1",

    "chip.pipeline_type" : "histone",
    "chip.peak_caller" : "macs2",
    "chip.align_only" : false,
    "chip.true_rep_only" : false,

    "chip.genome_tsv" : "/chip-seq-pipeline2/genome/hg19.tsv",
    "chip.paired_end" : true,
    "chip.ctl_paired_end" : true,

    "chip.fastqs_rep1_R1" : [ "/originalSamples/fastq/OP25/H3K4me1%OP25_ATACTTGG_1.fq.gz"],
    "chip.fastqs_rep1_R2" : [ "/originalSamples/fastq/OP25/H3K4me1%OP25_ATACTTGG_2.fq.gz"],
    "chip.fastqs_rep2_R1" : [ "/originalSamples/fastq/OP9/H3K4me1%OP9_TGCGCAAT_1.fq.gz"],
    "chip.fastqs_rep2_R2" : [ "/originalSamples/fastq/OP9/H3K4me1%OP9_TGCGCAAT_2.fq.gz"],

    "chip.ctl_fastqs_rep1_R1" : [ "/originalSamples/fastq/OP25/H3%OP25_ATACTTGG_1.fq.gz"],
    "chip.ctl_fastqs_rep1_R2" : [ "/originalSamples/fastq/OP25/H3%OP25_ATACTTGG_2.fq.gz"],
    "chip.ctl_fastqs_rep2_R1" : [ "/originalSamples/fastq/OP9/H3%OP9_TGCGCAAT_1.fq.gz"],
    "chip.ctl_fastqs_rep2_R2" : [ "/originalSamples/fastq/OP9/H3%OP9_TGCGCAAT_2.fq.gz"],

    "chip.crop_length" : 0,

    "chip.mapq_thresh" : 30,
    "chip.dup_marker" : "picard",
    "chip.no_dup_removal" : false,

    "chip.subsample_reads" : 0,
    "chip.ctl_subsample_reads" : 0,
    "chip.xcor_subsample_reads" : 15000000,

    "chip.xcor_trim_bp" : 50,
    "chip.use_filt_pe_ta_for_xcor" : false,

    "chip.always_use_pooled_ctl" : true,
    "chip.ctl_depth_ratio" : 1.2,

    "chip.cap_num_peak" : 500000,
    "chip.pval_thresh" : 0.01,
    "chip.fdr_thresh" : 0.01,
    "chip.idr_thresh" : 0.05,

    "chip.enable_gc_bias" : true,
    "chip.enable_count_signal_track" : false,

    "chip.filter_chrs" : [],

    "chip.align_cpu" : 6,
    "chip.align_bowtie2_mem_factor" : 0.15,
    "chip.align_bwa_mem_factor" : 1.0,
    "chip.align_time_hr" : 48,
    "chip.align_bowtie2_disk_factor" : 8.0,
    "chip.align_bwa_disk_factor" : 8.0,

    "chip.filter_cpu" : 4,
    "chip.filter_mem_factor" : 0.4,
    "chip.filter_time_hr" : 24,
    "chip.filter_disk_factor" : 8.0,

    "chip.bam2ta_cpu" : 2,
    "chip.bam2ta_mem_factor" : 0.35,
    "chip.bam2ta_time_hr" : 6,
    "chip.bam2ta_disk_factor" : 4.0,

    "chip.spr_mem_factor" : 13.5,
    "chip.spr_disk_factor" : 18.0,

    "chip.enable_jsd": false,
    "chip.jsd_cpu" : 4,
    "chip.jsd_mem_factor" : 0.1,
    "chip.jsd_time_hr" : 6,
    "chip.jsd_disk_factor" : 2.0,

    "chip.xcor_cpu" : 2,
    "chip.xcor_mem_factor" : 1.0,
:

Troubleshooting result

If you ran caper run without Caper server then Caper automatically runs a troubleshooter for failed workflows. Find troubleshooting result in the bottom of Caper's screen log.

If you ran caper submit with a running Caper server then first find your workflow ID (1st column) with caper list and run caper debug [WORKFLOW_ID].

Paste troubleshooting result.

* Found failures JSON object.
[
    {
        "causedBy": [
            {
                "causedBy": [],
                "message": "Job chip.xcor:1:2 exited with return code 1 which has not been declared as a valid return code. See 'continueOnReturnCode' runtime attribute for more details."
            }
        ],
        "message": "Workflow failed"
    }
]
* Recursively finding failures in calls (tasks)...

==== NAME=chip.xcor, STATUS=RetryableFailure, PARENT=
SHARD_IDX=1, RC=1, JOB_ID=402477
START=2022-02-04T03:58:57.092Z, END=2022-02-04T03:59:35.114Z
STDOUT=/work/wx74/chip-seq-pipeline2-results/OP_1/test/chip/15cee7a3-e687-47fb-aaf2-8e022481ee9a/call-xcor/shard-1/execution/stdout
STDERR=/work/wx74/chip-seq-pipeline2-results/OP_1/test/chip/15cee7a3-e687-47fb-aaf2-8e022481ee9a/call-xcor/shard-1/execution/stderr
STDERR_CONTENTS=
Traceback (most recent call last):
  File "/work/wx74/.conda/envs/encode-chip-seq-pipeline-spp/bin/encode_task_xcor.py", line 156, in <module>
    main()
  File "/work/wx74/.conda/envs/encode-chip-seq-pipeline-spp/bin/encode_task_xcor.py", line 144, in main
    args.chip_seq_type, args.exclusion_range_min, args.exclusion_range_max)
  File "/work/wx74/.conda/envs/encode-chip-seq-pipeline-spp/bin/encode_task_xcor.py", line 105, in xcor
    run_shell_cmd(cmd1)
  File "/work/wx74/.conda/envs/encode-chip-seq-pipeline-spp/bin/encode_lib_common.py", line 359, in run_shell_cmd
    raise Exception(err_str)
Exception: PID=193447, PGID=193447, RC=1, DURATION_SEC=20.2
STDERR=Loading required package: caTools
Error in pdf(file = iparams$output.plot.file, width = 5, height = 5) : 
  invalid 'file' argument 'H3K4me1%OP9_TGCGCAAT_1.trim_50bp.srt.filt.no_chrM.15M.cc.plot.pdf'
Execution halted
leepc12 commented 2 years ago

Can you rename your FASTQs (to remove % from them) and start over? I think % in the file name is the culprit?

wx2022 commented 2 years ago

Thanks Jin! When I tried to rename the FASTQs and start over, I saw the python version conflict error again. So I had to uninstall and reinstall the pipeline's conda envs. However, the pipeline's second conda env (encode-chip-seq-pipeline-macs2) wasn't installed correctly, and it gave me the following errors:

CondaVerificationError: The package for certifi located at /work/wx74/.conda/pkgs/certifi-2021.10.8-py37h06a4308_2 appears to be corrupted. The path 'lib/python3.7/site-packages/certifi-2021.10.8-py3.7.egg-info/not-zip-safe' specified in the package manifest cannot be found.

CondaVerificationError: The package for certifi located at /work/wx74/.conda/pkgs/certifi-2021.10.8-py37h06a4308_2 appears to be corrupted. The path 'lib/python3.7/site-packages/certifi/init.py' specified in the package manifest cannot be found.

CondaVerificationError: The package for certifi located at /work/wx74/.conda/pkgs/certifi-2021.10.8-py37h06a4308_2 appears to be corrupted. The path 'lib/python3.7/site-packages/certifi/main.py' specified in the package manifest cannot be found.

CondaVerificationError: The package for certifi located at /work/wx74/.conda/pkgs/certifi-2021.10.8-py37h06a4308_2 appears to be corrupted. The path 'lib/python3.7/site-packages/certifi/cacert.pem' specified in the package manifest cannot be found.

CondaVerificationError: The package for certifi located at /work/wx74/.conda/pkgs/certifi-2021.10.8-py37h06a4308_2 appears to be corrupted. The path 'lib/python3.7/site-packages/certifi/core.py' specified in the package manifest cannot be found.

ClobberError: This transaction has incompatible packages due to a shared path. packages: defaults/linux-64::libwebp-base-1.2.0-h27cfd23_0, defaults/linux-64::libwebp-1.2.0-h89dd481_0 path: 'bin/webpinfo' ...

When I did " bash scripts/uninstall_conda_env.sh" in the chip-seq-pipeline2 folder, only encode-chip-seq-pipeline was removed but not encode-chip-seq-pipeline-macs2. Do you know how to remove the encode-chip-seq-pipeline-macs2 completely? Thanks!

wx2022 commented 2 years ago

After removing the corrupted pkg, uninstalling and re-installing the pipeline's envs, I managed to fix the python version conflicts. After renaming my FASTQs without %, the pipeline ran successfully! Thanks so much again for your help, Jin! I appreciate it a lot!

leepc12 commented 2 years ago

That sounds great. I am closing this issue.