Closed mhayes8520 closed 4 years ago
Upon repeating the sbatch submission, the second error repeated. Seems like this error may have something to do with the second error I mentioned: Exception in thread "main" htsjdk.samtools.SAMException: Exception counting mismatches for read SRR217305.1252214.1 36b aligned to chrIV:1017817-1017852. at htsjdk.samtools.util.SequenceUtil.countMismatches(SequenceUtil.java:490) at htsjdk.samtools.util.SequenceUtil.countMismatches(SequenceUtil.java:466) at htsjdk.samtools.util.SequenceUtil.countMismatches(SequenceUtil.java:504) at picard.analysis.GcBiasMetricsCollector.addRead(GcBiasMetricsCollector.java:389) at picard.analysis.GcBiasMetricsCollector.access$600(GcBiasMetricsCollector.java:48) at picard.analysis.GcBiasMetricsCollector$PerUnitGcBiasMetricsCollector.addReadToGcData(GcBiasMetricsCollector.java:221) at picard.analysis.GcBiasMetricsCollector$PerUnitGcBiasMetricsCollector.acceptRecord(GcBiasMetricsCollector.java:155) at picard.analysis.GcBiasMetricsCollector$PerUnitGcBiasMetricsCollector.acceptRecord(GcBiasMetricsCollector.java:100) at picard.metrics.MultiLevelCollector$AllReadsDistributor.acceptRecord(MultiLevelCollector.java:192) at picard.metrics.MultiLevelCollector.acceptRecord(MultiLevelCollector.java:315) at picard.analysis.CollectGcBiasMetrics.acceptRead(CollectGcBiasMetrics.java:172) at picard.analysis.SinglePassSamProgram.makeItSo(SinglePassSamProgram.java:145) at picard.analysis.SinglePassSamProgram.doWork(SinglePassSamProgram.java:84) at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:305) at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:103) at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:113) Caused by: java.lang.ArrayIndexOutOfBoundsException: 1017849 at htsjdk.samtools.util.SequenceUtil.countMismatches(SequenceUtil.java:482) ... 15 more
ModuleNotFoundError: No module named 'matplotlib'
Install matplotlib in pipeline's conda environment.
source activate encode-chip-seq-pipeline
conda install matplotlib
Not sure why it's missing in your env. It's in my env though.
(encode-chip-seq-pipeline) leepc12@kadru:/users/leepc12/code/chip-seq-pipeline2$ python
Python 3.6.6 |Anaconda, Inc.| (default, Oct 9 2018, 12:34:16)
[GCC 7.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import matplotlib
>>> matplotlib.__file__
'/users/leepc12/miniconda3/envs/encode-chip-seq-pipeline/lib/python3.6/site-packages/matplotlib/__init__.py'
>>>
right, it's in my environment too, but for some reason wasn't able to load.
However, that problem (for whatever reason) hasn't recurred - but the second problem I mentioned has happened repeatedly
Can you disable that gc_bias
analysis and see what happens for other downstream analyses?
chip.enable_gc_bias: false
I did that, and the pipeline seems to be running fine for now - I'll let you know if there are other issues that come up. It would be great to get the gc bias test running as well though.
Closing due to long inactivity.
Describe the bug During gc_bias calculation the pipeline fails because matplotlib can't be imported while running the python script for calculating gc bias. However, matplotlib is for sure installed in the encode-chip-seq-pipeline python environment, which is confusing.
Attached are the slurm.out and stderr files cd stderr_200701.txt slurm-3313658.out.txt
In a second test, the pipeline fails at a different point with the stderr saying: FileNotFoundError: [Errno 2] File Pho4_ChIP_NoPi_R1.nodup.gc.txt does not exist: 'Pho4_ChIP_NoPi_R1.nodup.gc.txt' Attached are the second slurm.out, stderr, and stdout files.
slurm-3393083_2.out.txt stdout_2.txt stderr_2.txt
It seems like the pipeline is somehow unstable, since it's reporting different errors?
OS/Platform
Caper configuration file backend=slurm slurm-partition=akundaje
DO NOT use /tmp here
You can use $OAK or $SCRATCH storages here.
Caper stores all important temp files and cached big data files here
If not defined, Caper will make .caper_tmp/ on your local output directory
which is defined by out-dir, --out-dir or $CWD
Use a local absolute path here
tmp-dir=/scratch/users/mihayes
IMPORTANT warning for Stanford Sherlock cluster
====================================================================
DO NOT install any codes/executables
(java, conda, python, caper, pipeline's WDL, pipeline's Conda env, ...) on $SCRATCH or $OAK.
You will see Segmentation Fault errors.
Install all executables on $HOME or $PI_HOME instead.
It's STILL OKAY to read input data from and write outputs to $SCRATCH or $OAK.
====================================================================
cromwell=/home/users/mihayes/.caper/cromwell_jar/cromwell-47.jar womtool=/home/users/mihayes/.caper/womtool_jar/womtool-47.jar
Input JSON file { "chip.title" : "Pho4_ChIP_NoPi", "chip.description" : "Zhou and O'Shea 2011 Pho4 ChIP-seq",
}
Error log Caper automatically runs a troubleshooter for failed workflows. If it doesn't then get a
WORKFLOW_ID
of your failed workflow withcaper list
or directly use ametadata.json
file on Caper's output directory.