ENCODE-DCC / chip-seq-pipeline2

ENCODE ChIP-seq pipeline
MIT License
241 stars 123 forks source link

Pipeline fails during gc_bias_calc #168

Closed mhayes8520 closed 4 years ago

mhayes8520 commented 4 years ago

Describe the bug During gc_bias calculation the pipeline fails because matplotlib can't be imported while running the python script for calculating gc bias. However, matplotlib is for sure installed in the encode-chip-seq-pipeline python environment, which is confusing.

Attached are the slurm.out and stderr files cd stderr_200701.txt slurm-3313658.out.txt

In a second test, the pipeline fails at a different point with the stderr saying: FileNotFoundError: [Errno 2] File Pho4_ChIP_NoPi_R1.nodup.gc.txt does not exist: 'Pho4_ChIP_NoPi_R1.nodup.gc.txt' Attached are the second slurm.out, stderr, and stdout files.

slurm-3393083_2.out.txt stdout_2.txt stderr_2.txt

It seems like the pipeline is somehow unstable, since it's reporting different errors?

OS/Platform

Caper configuration file backend=slurm slurm-partition=akundaje

DO NOT use /tmp here

You can use $OAK or $SCRATCH storages here.

Caper stores all important temp files and cached big data files here

If not defined, Caper will make .caper_tmp/ on your local output directory

which is defined by out-dir, --out-dir or $CWD

Use a local absolute path here

tmp-dir=/scratch/users/mihayes

IMPORTANT warning for Stanford Sherlock cluster

====================================================================

DO NOT install any codes/executables

(java, conda, python, caper, pipeline's WDL, pipeline's Conda env, ...) on $SCRATCH or $OAK.

You will see Segmentation Fault errors.

Install all executables on $HOME or $PI_HOME instead.

It's STILL OKAY to read input data from and write outputs to $SCRATCH or $OAK.

====================================================================

cromwell=/home/users/mihayes/.caper/cromwell_jar/cromwell-47.jar womtool=/home/users/mihayes/.caper/womtool_jar/womtool-47.jar

Input JSON file { "chip.title" : "Pho4_ChIP_NoPi", "chip.description" : "Zhou and O'Shea 2011 Pho4 ChIP-seq",

"chip.pipeline_type" : "tf",
"chip.aligner" : "bowtie2",
"chip.align_only" : false,
"chip.true_rep_only" : false,

"chip.genome_tsv" : "/oak/stanford/groups/pfordyce/data-workspace/repeats/genomes/saccer3/encode_chip_pipeline_genome/saccer3_flab.tsv",

"chip.paired_end" : false,
"chip.ctl_paired_end" : false,

"chip.fastqs_rep1_R1" : [ "/oak/stanford/groups/pfordyce/data-workspace/repeats/data/raw/Zhou_OShea_2011/Pho4_ChIP_NoPi_R1.fastq.gz" ],
"chip.fastqs_rep2_R1" : [ "/oak/stanford/groups/pfordyce/data-workspace/repeats/data/raw/Zhou_OShea_2011/Pho4_ChIP_NoPi_R2.fastq.gz" ],

"chip.ctl_fastqs_rep1_R1" : [ "/oak/stanford/groups/pfordyce/data-workspace/repeats/data/raw/Zhou_OShea_2011/Pho4_Input_NoPi.fastq.gz" ],
"chip.ctl_fastqs_rep2_R1" : [ "/oak/stanford/groups/pfordyce/data-workspace/repeats/data/raw/Zhou_OShea_2011/Pho2_Input_NoPi.fastq.gz" ],
"chip.ctl_fastqs_rep3_R1" : [ "/oak/stanford/groups/pfordyce/data-workspace/repeats/data/raw/Zhou_OShea_2011/Cbf1_Input_NoPi.fastq.gz" ],

"chip.always_use_pooled_ctl" : true,
"chip.ctl_depth_ratio" : 1.2,

"chip.mapq_thresh" : 30,
"chip.dup_marker" : "picard",
"chip.no_dup_removal" : false,

"chip.peak_caller" : "spp",
"chip.cap_num_peak_macs2" : 500000,
"chip.pval_thresh" : 0.01,
"chip.fdr_thresh" : 0.01,
"chip.idr_thresh" : 0.05,
"chip.cap_num_peak_spp" : 300000,

"chip.enable_jsd" : true,
"chip.enable_gc_bias" : true,
"chip.enable_count_signal_track" : true,

"chip.xcor_trim_bp" : 10

}

Error log Caper automatically runs a troubleshooter for failed workflows. If it doesn't then get a WORKFLOW_ID of your failed workflow with caper list or directly use a metadata.json file on Caper's output directory.

$ caper debug [WORKFLOW_ID_OR_METADATA_JSON_FILE]
mhayes8520 commented 4 years ago

Upon repeating the sbatch submission, the second error repeated. Seems like this error may have something to do with the second error I mentioned: Exception in thread "main" htsjdk.samtools.SAMException: Exception counting mismatches for read SRR217305.1252214.1 36b aligned to chrIV:1017817-1017852. at htsjdk.samtools.util.SequenceUtil.countMismatches(SequenceUtil.java:490) at htsjdk.samtools.util.SequenceUtil.countMismatches(SequenceUtil.java:466) at htsjdk.samtools.util.SequenceUtil.countMismatches(SequenceUtil.java:504) at picard.analysis.GcBiasMetricsCollector.addRead(GcBiasMetricsCollector.java:389) at picard.analysis.GcBiasMetricsCollector.access$600(GcBiasMetricsCollector.java:48) at picard.analysis.GcBiasMetricsCollector$PerUnitGcBiasMetricsCollector.addReadToGcData(GcBiasMetricsCollector.java:221) at picard.analysis.GcBiasMetricsCollector$PerUnitGcBiasMetricsCollector.acceptRecord(GcBiasMetricsCollector.java:155) at picard.analysis.GcBiasMetricsCollector$PerUnitGcBiasMetricsCollector.acceptRecord(GcBiasMetricsCollector.java:100) at picard.metrics.MultiLevelCollector$AllReadsDistributor.acceptRecord(MultiLevelCollector.java:192) at picard.metrics.MultiLevelCollector.acceptRecord(MultiLevelCollector.java:315) at picard.analysis.CollectGcBiasMetrics.acceptRead(CollectGcBiasMetrics.java:172) at picard.analysis.SinglePassSamProgram.makeItSo(SinglePassSamProgram.java:145) at picard.analysis.SinglePassSamProgram.doWork(SinglePassSamProgram.java:84) at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:305) at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:103) at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:113) Caused by: java.lang.ArrayIndexOutOfBoundsException: 1017849 at htsjdk.samtools.util.SequenceUtil.countMismatches(SequenceUtil.java:482) ... 15 more

leepc12 commented 4 years ago
ModuleNotFoundError: No module named 'matplotlib'

Install matplotlib in pipeline's conda environment.

source activate encode-chip-seq-pipeline
conda install matplotlib
leepc12 commented 4 years ago

Not sure why it's missing in your env. It's in my env though.

(encode-chip-seq-pipeline) leepc12@kadru:/users/leepc12/code/chip-seq-pipeline2$ python
Python 3.6.6 |Anaconda, Inc.| (default, Oct  9 2018, 12:34:16)
[GCC 7.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import matplotlib
>>> matplotlib.__file__
'/users/leepc12/miniconda3/envs/encode-chip-seq-pipeline/lib/python3.6/site-packages/matplotlib/__init__.py'
>>>
mhayes8520 commented 4 years ago

right, it's in my environment too, but for some reason wasn't able to load. Screen Shot 2020-07-01 at 1 03 38 PM

mhayes8520 commented 4 years ago

However, that problem (for whatever reason) hasn't recurred - but the second problem I mentioned has happened repeatedly

leepc12 commented 4 years ago

Can you disable that gc_bias analysis and see what happens for other downstream analyses? chip.enable_gc_bias: false

mhayes8520 commented 4 years ago

I did that, and the pipeline seems to be running fine for now - I'll let you know if there are other issues that come up. It would be great to get the gc bias test running as well though.

leepc12 commented 4 years ago

Closing due to long inactivity.