ENCODE-DCC / chip-seq-pipeline2

ENCODE ChIP-seq pipeline
MIT License
246 stars 123 forks source link

Failure at xcor task #156

Closed Fnyasimi closed 4 years ago

Fnyasimi commented 4 years ago

I am running the pipeline in parallel and I am getting a failure on encode_task_xcor.py with the error below. What might be the problem?

Traceback (most recent call last):
  File "/home/ckibet/lustre/miniconda3/envs/encode-chip-seq-pipeline/bin/encode_task_xcor.py", line 156, in <module>
    main()
  File "/home/ckibet/lustre/miniconda3/envs/encode-chip-seq-pipeline/bin/encode_task_xcor.py", line 144, in main
    args.chip_seq_type, args.exclusion_range_min, args.exclusion_range_max)
  File "/home/ckibet/lustre/miniconda3/envs/encode-chip-seq-pipeline/bin/encode_task_xcor.py", line 105, in xcor
    run_shell_cmd(cmd1)
  File "/mnt/lustre/users/ckibet/miniconda3/envs/encode-chip-seq-pipeline/bin/encode_lib_common.py", line 319, in run_shell_cmd
    raise Exception(err_str)
Exception: PID=149955, PGID=149955, RC=1
STDERR=/home/ckibet/lustre/miniconda3/envs/encode-chip-seq-pipeline/lib/R/bin/R: line 238: /home/ckibet/lustre/miniconda3/envs/encode-chip-seq-pipeline/lib/R/etc/ldpaths: No such file or directory
STDOUT=
leepc12 commented 4 years ago

Can you check if that file /home/ckibet/lustre/miniconda3/envs/encode-chip-seq-pipeline/lib/R/etc/ldpaths exists? If not, try re-installing pipeline's Conda env (scripts/uninstall_conda_env.sh).

This is my result:

(encode-chip-seq-pipeline) leepc12@kadru:~$ ls -l /users/leepc12/miniconda3/envs/encode-chip-seq-pipeline/lib/R/etc/
total 24
-rw-rw-r-- 1 leepc12 users  209 Dec  8  2018 javaconf
-rw-r--r-- 1 leepc12 users 1446 May  7 08:45 ldpaths
-rw-r--r-- 1 leepc12 users 7676 May  7 08:45 Makeconf
-rw-rw-r-- 1 leepc12 users 1524 Apr 22 15:36 Renviron
-rw-rw-r-- 3 leepc12 users 1095 Dec  8  2018 repositories
Fnyasimi commented 4 years ago

Thank i installed the env again and it worked.

I have a few issues with some json files resulting to the errors below;

Traceback (most recent call last):
  File "/home/ckibet/lustre/miniconda3/envs/encode-chip-seq-pipeline/bin/caper", line 13, in <module>
    main()
  File "/home/ckibet/lustre/miniconda3/envs/encode-chip-seq-pipeline/lib/python3.6/site-packages/caper/cli.py", line 54, in main
    c.run()
  File "/home/ckibet/lustre/miniconda3/envs/encode-chip-seq-pipeline/lib/python3.6/site-packages/caper/caper.py", line 195, in run
    input_file = self.__create_input_json_file(tmp_dir)
  File "/home/ckibet/lustre/miniconda3/envs/encode-chip-seq-pipeline/lib/python3.6/site-packages/caper/caper.py", line 683, in __create_input_json_file
    make_md5_file=True)
  File "/home/ckibet/lustre/miniconda3/envs/encode-chip-seq-pipeline/lib/python3.6/site-packages/autouri/autouri.py", line 525, in localize
    src_contents = src_uri.read()
  File "/home/ckibet/lustre/miniconda3/envs/encode-chip-seq-pipeline/lib/python3.6/site-packages/autouri/abspath.py", line 86, in read
    return fp.read()
  File "/home/ckibet/lustre/miniconda3/envs/encode-chip-seq-pipeline/lib/python3.6/encodings/ascii.py", line 26, in decode
    return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 529: ordinal not in range(128)

The content of the Json file that resulted into the above error are;

{
    "chip.title" : "A cryptic Tudor domain links BRWD2/PHIP to COMPASS-mediated histone H3K4 methylation",

    "chip.description" : "Histone H3 Lys4 (H3K4) methylation is a chromatin feature enriched at gene cis-regulatory sequences such as promoters and enhancers. Here we identify an evolutionarily conserved factor, BRWD2/PHIP, which colocalizes with histone H3K4 methylation genome-wide in human cells, mouse embryonic stem cells, and Drosophila. Biochemical analysis of BRWD2 demonstrated an association with the Cullin-4–RING ubiquitin E3 ligase-4 (CRL4) complex, nucleosomes, and chromatin remodelers. BRWD2/PHIP binds directly to H3K4 methylation through a previously unidentified chromatin-binding module related to Royal Family Tudor domains, which we named the CryptoTudor domain. Using CRISPR–Cas9 genetic knockouts, we demonstrate that COMPASS H3K4 methyltransferase family members differentially regulate BRWD2/PHIP chromatin occupancy. Finally, we demonstrate that depletion of the single Drosophila homolog dBRWD3 results in altered gene expression and aberrant patterns of histone H3 Lys27 acetylation at enhancers and promoters, suggesting a cross-talk between these chromatin modifications and transcription through the BRWD protein family.",

    "chip.pipeline_type" : "tf",
    "chip.aligner" : "bowtie2",
    "chip.align_only" : false,

    "chip.genome_tsv" : "/home/ckibet/lustre/MARS_update/Data/genome/dm6/dm6.tsv",

    "chip.fastqs_rep1_R1" : [ "/home/ckibet/lustre/MARS_update/Data/Reads/SRR5850384_1.fastq.gz", "/home/ckibet/lustre/MARS_update/Data/Reads/SRR5850385_1.fastq.gz", "/home/ckibet/lustre/MARS_update/Data/Reads/SRR5850386_1.fastq.gz", "/home/ckibet/lustre/MARS_update/Data/Reads/SRR5850387_1.fastq.gz" ],

    "chip.paired_end" : false,
    "chip.always_use_pooled_ctl" : false,

    "chip.ctl_fastqs_rep1_R1" : [ "/home/ckibet/lustre/MARS_update/Data/Reads/SRR5850464_1.fastq.gz", "/home/ckibet/lustre/MARS_update/Data/Reads/SRR5850465_1.fastq.gz", "/home/ckibet/lustre/MARS_update/Data/Reads/SRR5850466_1.fastq.gz", "/home/ckibet/lustre/MARS_update/Data/Reads/SRR5850467_1.fastq.gz" ],

    "chip.ctl_paired_end" : false,

    "chip.align_mem_mb" : 20000,
    "chip.call_peak_mem_mb" : 16000,
    "chip.align_cpu" : 8,
    "chip.call_peak_cpu" : 4

}

Kindly help on this I have a few json files giving same error at different characters.

leepc12 commented 4 years ago

What is your command line (caper run ...)?

Fnyasimi commented 4 years ago

caper run chip-seq-pipeline2/chip.wdl -i Jsonfiles/GSE101646_dBRWD2.json --out-dir Results/GSE101646_dBRWD2

leepc12 commented 4 years ago

Is your input JSON based on ascii? Any illegal characters in it?

Fnyasimi commented 4 years ago

I created the json file by parsing data from a GEO experiment soft file. I am not sure if the input was based on ascii and I tried to find illegal characters but I couldn't spot any.

leepc12 commented 4 years ago

Can you try without title and description.

Fnyasimi commented 4 years ago

Without the two it runs well without an issue. I think there are some characters which are interpreted as non ascii in the description but we cant tell by looking at the description or title

Fnyasimi commented 4 years ago

@leepc12 What bring about this error and how do I pass it?

Error: Cannot use automatic control subsampling ("chip.ctl_depth_limit">0 and "chip.exp_ctl_depth_limit">0) for multiple controls with mixed endedness (e.g. SE ctl-rep1 and PE ctl-rep2). Automatic control subsampling is enabled by default. Disable automatic control subsampling by explicitly defining the above two parameters as 0 in your input JSON file. You can still use manual control subsamping ("chip.ctl_subsample_reads">0) since it is done for individual control's TAG-ALIGN output according to each control's endedness.

leepc12 commented 4 years ago

https://github.com/ENCODE-DCC/chip-seq-pipeline2/blob/e2a698d0dcc3d7b16ac8b9dc86d2f9097a35f0b8/chip.wdl#L396

This error occurs when

( ctl_depth_limit > 0 || exp_ctl_depth_ratio_limit > 0 ) && num_ctl > 1 && length(ctl_paired_ends) > 1 

Please attach your input JSON.

Fnyasimi commented 4 years ago

{ "chip.title" : " male X-chromosome during early Drosophila development",

"chip.description" : "Msl",

"chip.pipeline_type" : "tf",
"chip.aligner" : "bowtie2",
"chip.align_only" : false,

"chip.genome_tsv" : "/home/ckibet/lustre/MARS_update/Data/genome/dm6/dm6.tsv",

"chip.fastqs_rep1_R1" : ["/home/ckibet/lustre/MARS_update/Data/Reads/SRR9624462_1.fastq.gz" ],
"chip.fastqs_rep2_R1" : ["/home/ckibet/lustre/MARS_update/Data/Reads/SRR9624477_1.fastq.gz" ],
"chip.fastqs_rep3_R1" : ["/home/ckibet/lustre/MARS_update/Data/Reads/SRR9624478_1.fastq.gz" ],
"chip.fastqs_rep4_R1" : ["/home/ckibet/lustre/MARS_update/Data/Reads/SRR9624493_1.fastq.gz" ],
"chip.fastqs_rep5_R1" : ["/home/ckibet/lustre/MARS_update/Data/Reads/SRR9624494_1.fastq.gz" ],
"chip.fastqs_rep6_R1" : ["/home/ckibet/lustre/MARS_update/Data/Reads/SRR9624509_1.fastq.gz" ],
"chip.fastqs_rep7_R1" : ["/home/ckibet/lustre/MARS_update/Data/Reads/SRR9624510_1.fastq.gz" ],
"chip.fastqs_rep8_R1" : ["/home/ckibet/lustre/MARS_update/Data/Reads/SRR9624519_1.fastq.gz" ],
"chip.fastqs_rep8_R2" : [ "/home/ckibet/lustre/MARS_update/Data/Reads/SRR9624519_2.fastq.gz" ],

"chip.paired_ends" : [ false, false, false, false, false, false, false, true ],
"chip.always_use_pooled_ctl" : false,

"chip.ctl_fastqs_rep1_R1" : ["/home/ckibet/lustre/MARS_update/Data/Reads/SRR9624460_1.fastq.gz" ],
"chip.ctl_fastqs_rep2_R1" : ["/home/ckibet/lustre/MARS_update/Data/Reads/SRR9624473_1.fastq.gz" ],
"chip.ctl_fastqs_rep3_R1" : ["/home/ckibet/lustre/MARS_update/Data/Reads/SRR9624474_1.fastq.gz" ],
"chip.ctl_fastqs_rep4_R1" : ["/home/ckibet/lustre/MARS_update/Data/Reads/SRR9624489_1.fastq.gz" ],
"chip.ctl_fastqs_rep5_R1" : ["/home/ckibet/lustre/MARS_update/Data/Reads/SRR9624490_1.fastq.gz" ],
"chip.ctl_fastqs_rep6_R1" : ["/home/ckibet/lustre/MARS_update/Data/Reads/SRR9624505_1.fastq.gz" ],
"chip.ctl_fastqs_rep7_R1" : ["/home/ckibet/lustre/MARS_update/Data/Reads/SRR9624506_1.fastq.gz" ],
"chip.ctl_fastqs_rep8_R1" : ["/home/ckibet/lustre/MARS_update/Data/Reads/SRR9624517_1.fastq.gz" ],
"chip.ctl_fastqs_rep8_R2" : [ "/home/ckibet/lustre/MARS_update/Data/Reads/SRR9624517_2.fastq.gz" ],

"chip.ctl_paired_ends" : [ false, false, false, false, false, false, false, true ]

}

leepc12 commented 4 years ago

We recently added control subsampling and it's not allowed to use controls with mixed endedness (SE and PE). There are two ways to get around this error.

1) Set "chip.ctl_depth_limit": 0 and "chip.exp_ctl_depth_ratio_limit": 0 in your input JSON. Control subsampling will be disabled.

2) Remove control rep8 stuff from your input JSON and try again.

Fnyasimi commented 4 years ago

Thank you I for the info

Fnyasimi commented 4 years ago

Hi @leepc12 I was wondering if there is a generic image or pdf showing each step of the workflow how data is processed using this pipeline and is there a proper doi to reference this pipeline or just providing a link to the repo is enough? I would appreciate you help on this especially getting a high resolution image of the workflow.