ENCODE-DCC / chip-seq-pipeline2

ENCODE ChIP-seq pipeline
MIT License
234 stars 123 forks source link

bedClip: command not found,Fail to run chip.call_peak workflow when test example #250

Closed ckfromCN closed 2 years ago

ckfromCN commented 2 years ago

Describe the bug

I just installed this workflow following the steps in the documentation, when I tested the example caper run chip.wdl -i https://storage.googleapis.com/encode-pipeline-test-samples/encode-chip-seq-pipeline/ENCSR000DYI_subsampled_chr19_only.json --conda,Prompt that the task execution failed

By looking at the log, I found that it may be caused by the bedClip package when the call_peak is executed.

There are two related errors. One is that there is no bedClip command in the environment./bin/bash: line 1: bedClip: command not found,

the other is that the parameter truncate is not available STDERR=-truncate is not a valid option

OS/Platform

Caper configuration file

Paste contents of ~/.caper/default.conf.

backend=local
local-hash-strat=path+modtime
local-loc-dir=

cromwell=/home/kchen/.caper/cromwell_jar/cromwell-65.jar
womtool=/home/kchen/.caper/womtool_jar/womtool-65.jar

Input JSON file

{
    "chip.pipeline_type" : "tf",
    "chip.genome_tsv" : "https://storage.googleapis.com/encode-pipeline-genome-data/genome_tsv/v3/hg38_chr19_chrM.tsv",
    "chip.fastqs_rep1_R1" : ["https://storage.googleapis.com/encode-pipeline-test-samples/encode-chip-seq-pipeline/ENCSR000DYI/fastq_subsampled/rep1.subsampled.25.fastq.gz"
    ],
    "chip.fastqs_rep2_R1" : ["https://storage.googleapis.com/encode-pipeline-test-samples/encode-chip-seq-pipeline/ENCSR000DYI/fastq_subsampled/rep2.subsampled.20.fastq.gz"
    ],
    "chip.ctl_fastqs_rep1_R1" : ["https://storage.googleapis.com/encode-pipeline-test-samples/encode-chip-seq-pipeline/ENCSR000DYI/fastq_subsampled/ctl1.subsampled.25.fastq.gz"
    ],
    "chip.ctl_fastqs_rep2_R1" : ["https://storage.googleapis.com/encode-pipeline-test-samples/encode-chip-seq-pipeline/ENCSR000DYI/fastq_subsampled/ctl2.subsampled.25.fastq.gz"
    ],
    "chip.paired_end" : false,
    "chip.title" : "ENCSR000DYI (subsampled 1/25, chr19_chrM only)",
    "chip.description" : "CEBPB ChIP-seq on human A549 produced by the Snyder lab"
}

Troubleshooting result

Part of cromwell.out file information

Job chip.call_peak_ppr1:NA:2 exited with return code 1 which has not been declared as a valid return code. See 'continueOnReturnCode' runtime attribute for more details.
Check the content of stderr for potential additional information: /home/kchen/project/others_project/otherLab/xudan/myanalysis/encode/example/chip/bc868ec3-792e-45c0-8067-ce8def4e67f5/call-call_peak_ppr1/attempt-2/execution/stderr.
 [First 3000 bytes]:Traceback (most recent call last):
  File "/home/kchen/anaconda2/envs/encode-chip-seq-pipeline-spp/bin/encode_task_spp.py", line 133, in <module>
    main()
  File "/home/kchen/anaconda2/envs/encode-chip-seq-pipeline-spp/bin/encode_task_spp.py", line 119, in main
    args.ctl_subsample, args.ctl_paired_end, args.nth, args.out_dir)
  File "/home/kchen/anaconda2/envs/encode-chip-seq-pipeline-spp/bin/encode_task_spp.py", line 103, in spp
    bed_clip(rpeak_tmp2, chrsz, rpeak)
  File "/home/kchen/anaconda2/envs/encode-chip-seq-pipeline-spp/bin/encode_lib_genomic.py", line 710, in bed_clip
    run_shell_cmd(cmd)
  File "/home/kchen/anaconda2/envs/encode-chip-seq-pipeline-spp/bin/encode_lib_common.py", line 359, in run_shell_cmd
    raise Exception(err_str)
Exception: PID=14811, PGID=14811, RC=127, DURATION_SEC=0.0
STDERR=/bin/bash: line 1: bedClip: command not found
STDOUT=

Job chip.call_peak:1:2 exited with return code 1 which has not been declared as a valid return code. See 'continueOnReturnCode' runtime attribute for more det
ails.
Check the content of stderr for potential additional information: /home/kchen/project/others_project/otherLab/xudan/myanalysis/encode/example/chip/bc868ec3-79
2e-45c0-8067-ce8def4e67f5/call-call_peak/shard-1/attempt-2/execution/stderr.
 [First 3000 bytes]:Traceback (most recent call last):
  File "/home/kchen/anaconda2/envs/encode-chip-seq-pipeline-spp/bin/encode_task_spp.py", line 133, in <module>
    main()
  File "/home/kchen/anaconda2/envs/encode-chip-seq-pipeline-spp/bin/encode_task_spp.py", line 119, in main
    args.ctl_subsample, args.ctl_paired_end, args.nth, args.out_dir)
  File "/home/kchen/anaconda2/envs/encode-chip-seq-pipeline-spp/bin/encode_task_spp.py", line 103, in spp
    bed_clip(rpeak_tmp2, chrsz, rpeak)
  File "/home/kchen/anaconda2/envs/encode-chip-seq-pipeline-spp/bin/encode_lib_genomic.py", line 710, in bed_clip
    run_shell_cmd(cmd)
  File "/home/kchen/anaconda2/envs/encode-chip-seq-pipeline-spp/bin/encode_lib_common.py", line 359, in run_shell_cmd
    raise Exception(err_str)
Exception: PID=14687, PGID=14687, RC=127, DURATION_SEC=0.0
STDERR=/bin/bash: line 1: bedClip: command not found
STDOUT=
$cat /home/kchen/xudan/myanalysis/encode/example/chip/95f9dcf2-e713-4966-86c9-39a74705c2e1/call-call_peak/shard-1/execution/stderr
Traceback (most recent call last):
  File "/home/kchen/anaconda2/envs/encode-chip-seq-pipeline-spp/bin/encode_task_spp.py", line 133, in <module>
    main()
  File "/home/kchen/anaconda2/envs/encode-chip-seq-pipeline-spp/bin/encode_task_spp.py", line 119, in main
    args.ctl_subsample, args.ctl_paired_end, args.nth, args.out_dir)
  File "/home/kchen/anaconda2/envs/encode-chip-seq-pipeline-spp/bin/encode_task_spp.py", line 103, in spp
    bed_clip(rpeak_tmp2, chrsz, rpeak)
  File "/home/kchen/anaconda2/envs/encode-chip-seq-pipeline-spp/bin/encode_lib_genomic.py", line 710, in bed_clip
    run_shell_cmd(cmd)
  File "/home/kchen/anaconda2/envs/encode-chip-seq-pipeline-spp/bin/encode_lib_common.py", line 359, in run_shell_cmd
    raise Exception(err_str)
Exception: PID=6347, PGID=6347, RC=255, DURATION_SEC=0.0
STDERR=-truncate is not a valid option
STDOUT=

cromwell.out.txt

leepc12 commented 2 years ago

I will fix this in the next release. Until then please manually install bedClip in pipeline's Conda environment.

$ source activate encode-chip-seq-pipeline-spp
$ which bedClip

# install bedClip inside the environment
$ conda install ucsc-bedclip ucsc-bedtobigbed -c bioconda

# check if it's correctly installed
$ which bedClip
ckfromCN commented 2 years ago

I will fix this in the next release. Until then please manually install bedClip in pipeline's Conda environment.

$ source activate encode-chip-seq-pipeline-spp
$ which bedClip

# install bedClip inside the environment
$ conda install ucsc-bedclip ucsc-bedtobigbed -c bioconda

# check if it's correctly installed
$ which bedClip

Thank you for noticing my problem, But after installing bedClipin encode-chip-seq-pipeline-spp, the task still fails And reminded me that bedClip is missing an option -truncate

how can I sovle this problem

Job chip.call_peak_pr2:0:2 exited with return code 1 which has not been declared as a valid return code. See 'continueOnReturnCode' runtime attribute for more details.
Check the content of stderr for potential additional information: /home/kchen/project/others_project/otherLab/xudan/myanalysis/encode/example/chip/582c407d-f362-4d17-b7f5-e7e18f778a14/call-call_peak_pr2/shard-0/attempt-2/execution/stderr.
 [First 3000 bytes]:Traceback (most recent call last):
  File "/home/kchen/anaconda2/envs/encode-chip-seq-pipeline-spp/bin/encode_task_spp.py", line 133, in <module>
    main()
  File "/home/kchen/anaconda2/envs/encode-chip-seq-pipeline-spp/bin/encode_task_spp.py", line 119, in main
    args.ctl_subsample, args.ctl_paired_end, args.nth, args.out_dir)
  File "/home/kchen/anaconda2/envs/encode-chip-seq-pipeline-spp/bin/encode_task_spp.py", line 103, in spp
    bed_clip(rpeak_tmp2, chrsz, rpeak)
  File "/home/kchen/anaconda2/envs/encode-chip-seq-pipeline-spp/bin/encode_lib_genomic.py", line 710, in bed_clip
    run_shell_cmd(cmd)
  File "/home/kchen/anaconda2/envs/encode-chip-seq-pipeline-spp/bin/encode_lib_common.py", line 359, in run_shell_cmd
    raise Exception(err_str)
Exception: PID=21822, PGID=21822, RC=255, DURATION_SEC=0.0
STDERR=-truncate is not a valid option
STDOUT=
leepc12 commented 2 years ago

What is your bedClip version? Here is the result on my end.

$ source activate encode-chip-seq-pipeline-spp

(encode-chip-seq-pipeline-spp) $ conda list bedclip
# packages in environment at /users/leepc12/miniconda3/envs/encode-chip-seq-pipeline-spp:
#
# Name                    Version                   Build  Channel
ucsc-bedclip              366                  h5eb252a_0    bioconda

(encode-chip-seq-pipeline-spp) $ which bedClip
/users/leepc12/miniconda3/envs/encode-chip-seq-pipeline-spp/bin/bedClip

(encode-chip-seq-pipeline-spp) $ bedClip
bedClip - Remove lines from bed file that refer to off-chromosome locations.
usage:
   bedClip [options] input.bed chrom.sizes output.bed
chrom.sizes is a two-column file/URL: <chromosome name> <size in bases>
If the assembly <db> is hosted by UCSC, chrom.sizes can be a URL like
  http://hgdownload.cse.ucsc.edu/goldenPath/<db>/bigZips/<db>.chrom.sizes
or you may use the script fetchChromSizes to download the chrom.sizes file.
If not hosted by UCSC, a chrom.sizes file can be generated by running
twoBitInfo on the assembly .2bit file.
options:
   -truncate  - truncate items that span ends of chrom instead of the
                default of dropping the items
   -verbose=2 - set to get list of lines clipped and why
ckfromCN commented 2 years ago

I changed bedclip version to 366 and it works, thank you