ENCODE-DCC / chip-seq-pipeline2

ENCODE ChIP-seq pipeline
MIT License
247 stars 123 forks source link

BUG: dim(X) must have a positive length #203

Open YogiOnBioinformatics opened 3 years ago

YogiOnBioinformatics commented 3 years ago

Describe the bug

Task xcor fails with specific issue:

Error in apply(ac, 2, function(x) sum(x * avw)) :
  dim(X) must have a positive length

OS/Platform

Input JSON file

{
    "chip.title": "P2L7S6_H3K4me3_ChIP",
    "chip.description": "",
    "chip.pipeline_type": "histone",
    "chip.paired_end": false,
    "chip.ctl_paired_end": false,
    "chip.genome_tsv": "/some_path/hg38/hg38.tsv",
    "chip.fastqs_rep1_R1": [
        "/absolute_path/201007Fra_D20-3991_NA_1.fastq.gz"
    ],
    "chip.ctl_fastqs_rep1_R1": [
        "/absolute_path/201007Fra_D20-3986_NA_1.fastq.gz"
    ]
}

Troubleshooting result

Paste troubleshooting result.

Traceback (most recent call last):
  File "/software/chip-seq-pipeline/src/encode_task_xcor.py", line 156, in <module>
    main()
  File "/software/chip-seq-pipeline/src/encode_task_xcor.py", line 144, in main
    args.chip_seq_type, args.exclusion_range_min, args.exclusion_range_max)
  File "/software/chip-seq-pipeline/src/encode_task_xcor.py", line 105, in xcor
    run_shell_cmd(cmd1)
  File "/software/chip-seq-pipeline/src/encode_lib_common.py", line 319, in run_shell_cmd
    raise Exception(err_str)
Exception: PID=1926106, PGID=1926106, RC=1
STDERR=Loading required package: caTools
Error in apply(ac, 2, function(x) sum(x * avw)) :
  dim(X) must have a positive length
Calls: get.binding.characteristics -> lapply -> FUN -> apply
Execution halted
STDOUT=################
ChIP data: 201007Fra_D20-3991_NA_1.trim_50bp.filt.no_chrM.15M.tagAlign.gz
Control data: NA
strandshift(min): -500
strandshift(step): 5
strandshift(max) 1500
user-defined peak shift NA
exclusion(min): -500
exclusion(max): 100
num parallel nodes: 2
FDR threshold: 0.01
NumPeaks Threshold: NA
Output Directory: .
narrowPeak output file name: NA
regionPeak output file name: NA
Rdata filename: NA
plot pdf filename: 201007Fra_D20-3991_NA_1.trim_50bp.filt.no_chrM.15M.cc.plot.pdf
result filename: 201007Fra_D20-3991_NA_1.trim_50bp.filt.no_chrM.15M.cc.qc
Overwrite files?: TRUE

Decompressing ChIP file
Reading ChIP tagAlign/BAM file 201007Fra_D20-3991_NA_1.trim_50bp.filt.no_chrM.15M.tagAlign.gz
opened /pool/data/cromwell-aals/cromwell-executions/chip/b744d0a2-c764-40a4-a952-fc43d2d0a1ee/call-xcor/shard-0/tmp.2813919d/RtmpwmIuJ2/201007Fra_D20-3991_NA_1.trim_50bp.filt.no_chrM.15M.tagAlign1d63dc4f08563d
done. read 16 fragments
ChIP data read length 40
[1] TRUE
Calculating peak characteristics
ln: failed to access '*.cc.plot.pdf': No such file or directory
ln: failed to access '*.cc.plot.png': No such file or directory
ln: failed to access '*.cc.qc': No such file or directory
ln: failed to access '*.cc.fraglen.txt': No such file or directory
YogiOnBioinformatics commented 3 years ago

@leepc12 @akundaje hope you both are well! Just wanted to follow up on this!

YogiOnBioinformatics commented 3 years ago

Super sorry to bother again @akundaje @leepc12 😄 Just wanted to see if you got around to this.

leepc12 commented 3 years ago

Sorry about late response, can you try with the latest pipeline?

YogiOnBioinformatics commented 3 years ago

I didn't end up using this sample. We had other replicates.

If I run into this problem again, I'll try that.

YogiOnBioinformatics commented 3 years ago

@leepc12 @akundaje

Just wanted to bring this back up. I cannot switch to a new pipeline since we have a large amount of samples analyzed with this pipeline version.

It seems that the bug is in run_spp.R as part of encode_task_xcor.py.

Is there any way to manually calculate the fraglen AND disable the xcor task so that it won't keep bugging. Even if I manually calculate fraglen, it seems xcor would run and hence, would fail again.

If I CANNOT disable xcor, how can I still manually calculate the fraglen

leepc12 commented 3 years ago

Define chip.fraglen as an array (for each replicate) and disable xcor with chip.enable_xcor.

{
    "chip.fraglen" : [100, 120],
    "chip.enable_xcor" : false
}
YogiOnBioinformatics commented 3 years ago

2 questions.

  1. For this pipeline, I remember that there is no enable_xcor option. This is especially since I am talking about v1.3.6. Am I missing something?
  2. How would I calculate fraglen easily? Would macs2 predictd work?
leepc12 commented 3 years ago

Define chip.fraglen first in your input JSON. You may need to modify chip.wdl.

Please delete these lines https://github.com/ENCODE-DCC/chip-seq-pipeline2/blob/v1.3.6/chip.wdl#L582-L596 https://github.com/ENCODE-DCC/chip-seq-pipeline2/blob/v1.3.6/chip.wdl#L1146-L1147

Please modify the following line https://github.com/ENCODE-DCC/chip-seq-pipeline2/blob/v1.3.6/chip.wdl#L601

to else 0

Please let me know if this works.

YogiOnBioinformatics commented 3 years ago

@leepc12 I figured out the issue I was having. I did not end up implementing this for the following reason.

The FASTQ data I was analyzing was EXTREMELY small (few KB file size) due to a mistake from a collaborator.

If in the future, I run into this issue with a normal FASTQ file, I will implement this and see if it works.

Thanks so much for your help! Collaborator mistakes are the worst.

leepc12 commented 3 years ago

Yep, thanks. Please let me know if the above implementation works or you can close the issue if it works.