NationalGenomicsInfrastructure / piper

A genomics pipeline build on top of the GATK Queue framework
9 stars 9 forks source link

Running piper on M.Kaller_14_06 #21

Closed vezzi closed 9 years ago

vezzi commented 10 years ago

Last Friday I tried to run Piper on the 6 samples recently generated using V4 technology. All the data and analysis can be found here (it should be accessible by all members of group a2010002) :

/proj/a2010002/nobackup/vezzi/ANALYSIS/M.Kaller_14_06

Qualimap is always failing. The error I find in the piper log is:

ERROR 14:15:58,665 FunctionEdge - Error: /proj/a2009002/piper_resources/programs/qualimap_v1.0/qualimap  --java-mem-size=64G  bamqc  -bam /apus/v1/a2010002_nobackup/vezzi/ANALYSIS/M.Kaller_14_06/pipeline_output/01_raw_alignments/P1171_104.AC41A2ANXX.P1171_104.5.bam --paint-chromosome-limits  -outdir /apus/v1/a2010002_nobackup/vezzi/ANALYSIS/M.Kaller_14_06/pipeline_output/02_preliminary_alignment_qc/P1171_104.AC41A2ANXX.P1171_104.5.qc/ -nt 8 &> /apus/v1/a2010002_nobackup/vezzi/ANALYSIS/M.Kaller_14_06/pipeline_output/02_preliminary_alignment_qc/P1171_104.AC41A2ANXX.P1171_104.5.qc.log 

Subsequently also HaplotypeCaller fails. This is the piper log on the first haplotypeCaller error:

ERROR 18:22:42,455 FunctionEdge - Error:  'java'  '-Xmx131072m'  '-XX:+UseParallelOldGC'  '-XX:ParallelGCThreads=4'  '-XX:GCTimeLimit=50'  '-XX:GCHeapFreeLimit=10'  '-Djava.io.tmpdir=/apus/v1/a2010002_nobackup/vezzi/ANALYSIS/M.Kaller_14_06/.queue/tmp'  '-cp' '/proj/a2010002/software/piper_bin/Pipe
r/current/lib/piper_2.10-v1.2.0-beta12.jar:/proj/a2010002/software/piper_bin/Piper/current/lib/aopalliance-1.0.jar:/proj/a2010002/software/piper_bin/Piper/current/lib/jcommander-1.7.jar:/proj/a2010002/software/piper_bin/Piper/current/lib/scopt_2.10-3.2.0.jar:/proj/a2010002/software/piper_bin/Piper
/current/lib/guice-2.0.jar:/proj/a2010002/software/piper_bin/Piper/current/lib/java-xmlbuilder-0.4.jar:/proj/a2010002/software/piper_bin/Piper/current/lib/commons-codec-1.3.jar:/proj/a2010002/software/piper_bin/Piper/current/lib/commons-httpclient-3.1.jar:/proj/a2010002/software/piper_bin/Piper/cu
rrent/lib/commons-io-2.1.jar:/proj/a2010002/software/piper_bin/Piper/current/lib/commons-lang-2.5.jar:/proj/a2010002/software/piper_bin/Piper/current/lib/commons-logging-1.1.1.jar:/proj/a2010002/software/piper_bin/Piper/current/lib/junit-3.8.1.jar:/proj/a2010002/software/piper_bin/Piper/current/li
b/log4j-1.2.16.jar:/proj/a2010002/software/piper_bin/Piper/current/lib/jets3t-0.8.1.jar:/proj/a2010002/software/piper_bin/Piper/current/lib/bsh-2.0b4.jar:/proj/a2010002/software/piper_bin/Piper/current/lib/scala-library-2.10.1.jar:/proj/a2010002/software/piper_bin/Piper/current/lib/simple-xml-2.0.
4.jar:/proj/a2010002/software/piper_bin/Piper/current/lib/testng-5.14.1.jar:/proj/a2010002/software/piper_bin/Piper/current/lib/stax-1.2.0.jar:/proj/a2010002/software/piper_bin/Piper/current/lib/stax-api-1.0.1.jar:/proj/a2010002/software/piper_bin/Piper/current/lib/GenomeAnalysisTK.jar:/proj/a2010
002/software/piper_bin/Piper/current/lib/Queue.jar'  'org.broadinstitute.gatk.engine.CommandLineGATK'  '-T' 'HaplotypeCaller'  '-I' '/apus/v1/a2010002_nobackup/vezzi/ANALYSIS/M.Kaller_14_06/pipeline_output/04_processed_alignments/P1171_104.clean.dedup.recal.bam'  '-L' '/apus/v1/a2010002_nobackup/v
ezzi/ANALYSIS/M.Kaller_14_06/.queue/scatterGather/DNABestPracticeVariantCalling-97-sg/temp_10_of_23/scatter.intervals'  '-R' '/proj/a2009002/piper_references/gatk_bundle/2.8/b37/human_g1k_v37.fasta'  '-dcov' '250'  '-nct' '16'  '-variant_index_type' 'LINEAR'  '-variant_index_parameter' '128000'  '
-o' '/apus/v1/a2010002_nobackup/vezzi/ANALYSIS/M.Kaller_14_06/.queue/scatterGather/DNABestPracticeVariantCalling-97-sg/temp_10_of_23/P1171_104.clean.dedup.recal.bam.genomic.vcf'  '-D' '/proj/a2009002/piper_references/gatk_bundle/2.8/b37/dbsnp_138.b37.vcf'  '-ERC' 'GVCF'  '-stand_call_conf' '30.0' 
 '-stand_emit_conf' '10.0'  '-pairHMM' 'LOGLESS_CACHING'  '-pcrModel' 'CONSERVATIVE'  
ERROR 18:22:45,601 FunctionEdge - Contents of /apus/v1/a2010002_nobackup/vezzi/ANALYSIS/M.Kaller_14_06/.queue/scatterGather/DNABestPracticeVariantCalling-97-sg/temp_10_of_23/P1171_104.clean.dedup.recal.bam.genomic.vcf.out:
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/apus/v1/a2010002/software/piper_bin/Piper/Piper-v1.2.0-beta12/lib/GenomeAnalysisTK.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/apus/v1/a2010002/software/piper_bin/Piper/Piper-v1.2.0-beta12/lib/Queue.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
INFO  18:20:36,270 HelpFormatter - -------------------------------------------------------------------------------- 
INFO  18:20:36,341 HelpFormatter - The Genome Analysis Toolkit (GATK) v3.2-0-g799071b, Compiled 2014/07/21 11:22:24 
INFO  18:20:36,341 HelpFormatter - Copyright (c) 2010 The Broad Institute 
INFO  18:20:36,341 HelpFormatter - For support and documentation go to http://www.broadinstitute.org/gatk 
INFO  18:20:36,404 HelpFormatter - Program Args: -T HaplotypeCaller -I /apus/v1/a2010002_nobackup/vezzi/ANALYSIS/M.Kaller_14_06/pipeline_output/04_processed_alignments/P1171_104.clean.dedup.recal.bam -L /apus/v1/a2010002_nobackup/vezzi/ANALYSIS/M.Kaller_14_06/.queue/scatterGather/DNABestPracticeVa
riantCalling-97-sg/temp_10_of_23/scatter.intervals -R /proj/a2009002/piper_references/gatk_bundle/2.8/b37/human_g1k_v37.fasta -dcov 250 -nct 16 -variant_index_type LINEAR -variant_index_parameter 128000 -o /apus/v1/a2010002_nobackup/vezzi/ANALYSIS/M.Kaller_14_06/.queue/scatterGather/DNABestPractic
eVariantCalling-97-sg/temp_10_of_23/P1171_104.clean.dedup.recal.bam.genomic.vcf -D /proj/a2009002/piper_references/gatk_bundle/2.8/b37/dbsnp_138.b37.vcf -ERC GVCF -stand_call_conf 30.0 -stand_emit_conf 10.0 -pairHMM LOGLESS_CACHING -pcrModel CONSERVATIVE 
INFO  18:20:36,417 HelpFormatter - Executing as vezzi@n35.uppmax.uu.se on Linux 2.6.32-431.20.3.el6.x86_64 amd64; Java HotSpot(TM) 64-Bit Server VM 1.7.0_25-b15. 
INFO  18:20:36,417 HelpFormatter - Date/Time: 2014/07/26 18:20:36 
INFO  18:20:36,417 HelpFormatter - -------------------------------------------------------------------------------- 
INFO  18:20:36,417 HelpFormatter - -------------------------------------------------------------------------------- 
INFO  18:20:38,852 GenomeAnalysisEngine - Strictness is SILENT 
INFO  18:20:40,735 GenomeAnalysisEngine - Downsampling Settings: Method: BY_SAMPLE, Target Coverage: 250 
INFO  18:20:40,742 SAMDataSource$SAMReaders - Initializing SAMRecords in serial 
INFO  18:20:42,491 SAMDataSource$SAMReaders - Done initializing BAM readers: total time 1.75 
INFO  18:20:43,764 HCMappingQualityFilter - Filtering out reads with MAPQ < 20 
INFO  18:20:46,336 IntervalUtils - Processing 134861075 bp from intervals 
INFO  18:20:46,348 MicroScheduler - Running the GATK in parallel mode with 16 total threads, 16 CPU thread(s) for each of 1 data thread(s), of 16 processors available on this machine 
INFO  18:20:46,534 GenomeAnalysisEngine - Preparing for traversal over 1 BAM files 
INFO  18:20:46,804 GenomeAnalysisEngine - Done preparing for traversal 
INFO  18:20:46,804 ProgressMeter - [INITIALIZATION COMPLETE; STARTING PROCESSING] 
INFO  18:20:46,804 ProgressMeter -                 |      processed |    time |         per 1M |           |   total | remaining 
INFO  18:20:46,805 ProgressMeter -        Location | active regions | elapsed | active regions | completed | runtime |   runtime 
INFO  18:20:46,805 HaplotypeCaller - Standard Emitting and Calling confidence set to 0.0 for reference-model confidence output 
INFO  18:20:46,805 HaplotypeCaller - All sites annotated with PLs forced to true for reference-model confidence output 
INFO  18:20:46,907 HaplotypeCaller - Using global mismapping rate of 45 => -4.5 in log10 likelihood units 
INFO  18:20:46,908 PairHMM - Performance profiling for PairHMM is disabled because HaplotypeCaller is being run with multiple threads (-nct>1) option
Profiling is enabled only when running in single thread mode

INFO  18:21:16,808 ProgressMeter -     6:151719912              0.0    30.0 s           49.6 w        0.4%     2.2 h       2.2 h 
INFO  18:21:47,400 ProgressMeter -     6:152718291              0.0    60.0 s          100.2 w        1.1%    89.3 m      88.3 m 
INFO  18:22:17,841 ProgressMeter -     6:153664816              0.0    91.0 s          150.5 w        1.8%    83.2 m      81.7 m 
INFO  18:22:21,399 GATKRunReport - Uploaded run statistics report to AWS S3 
##### ERROR ------------------------------------------------------------------------------------------
##### ERROR stack trace 
java.lang.IndexOutOfBoundsException: Index: 28, Size: 6
        at java.util.LinkedList.checkElementIndex(LinkedList.java:553)
        at java.util.LinkedList.get(LinkedList.java:474)
        at org.broadinstitute.gatk.tools.walkers.haplotypecaller.readthreading.DanglingChainMergingGraph.mergeDanglingTail(DanglingChainMergingGraph.java:272)
        at org.broadinstitute.gatk.tools.walkers.haplotypecaller.readthreading.DanglingChainMergingGraph.recoverDanglingTail(DanglingChainMergingGraph.java:184)
        at org.broadinstitute.gatk.tools.walkers.haplotypecaller.readthreading.DanglingChainMergingGraph.recoverDanglingTails(DanglingChainMergingGraph.java:131)
        at org.broadinstitute.gatk.tools.walkers.haplotypecaller.readthreading.ReadThreadingAssembler.createGraph(ReadThreadingAssembler.java:202)
        at org.broadinstitute.gatk.tools.walkers.haplotypecaller.readthreading.ReadThreadingAssembler.assemble(ReadThreadingAssembler.java:114)
        at org.broadinstitute.gatk.tools.walkers.haplotypecaller.LocalAssemblyEngine.runLocalAssembly(LocalAssemblyEngine.java:164)
        at org.broadinstitute.gatk.tools.walkers.haplotypecaller.HaplotypeCaller.assembleReads(HaplotypeCaller.java:1022)
        at org.broadinstitute.gatk.tools.walkers.haplotypecaller.HaplotypeCaller.map(HaplotypeCaller.java:882)
        at org.broadinstitute.gatk.tools.walkers.haplotypecaller.HaplotypeCaller.map(HaplotypeCaller.java:218)
        at org.broadinstitute.gatk.engine.traversals.TraverseActiveRegions$TraverseActiveRegionMap.apply(TraverseActiveRegions.java:708)
        at org.broadinstitute.gatk.engine.traversals.TraverseActiveRegions$TraverseActiveRegionMap.apply(TraverseActiveRegions.java:704)
        at org.broadinstitute.gatk.utils.nanoScheduler.NanoScheduler$ReadMapReduceJob.run(NanoScheduler.java:471)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
        at java.util.concurrent.FutureTask.run(FutureTask.java:166)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:724)
##### ERROR ------------------------------------------------------------------------------------------
##### ERROR A GATK RUNTIME ERROR has occurred (version 3.2-0-g799071b):
##### ERROR
##### ERROR This might be a bug. Please check the documentation guide to see if this is a known problem.
##### ERROR If not, please post the error message, with stack trace, to the GATK forum.
##### ERROR Visit our website and forum for extensive documentation and answers to 
##### ERROR commonly asked questions http://www.broadinstitute.org/gatk
##### ERROR
##### ERROR MESSAGE: Index: 28, Size: 6
##### ERROR ------------------------------------------------------------------------------------------ 
INFO  18:22:45,601 QGraph - Writing incremental jobs reports... 
INFO  18:22:45,607 QGraph - 203 Pend, 63 Run, 33 Fail, 411 Done 
vezzi commented 10 years ago

The samples are 4 not 6

johandahlberg commented 10 years ago

The above is related to a bug in the GATK version we are currently using. Issue #19 will solve this.

vezzi commented 10 years ago

Good to know, I was expecting this. I will close this as a test run for issue #19

johandahlberg commented 10 years ago

@vezzi This should be fixed in the latest release. You can test this again. :)

vezzi commented 10 years ago

First thing I will do tomorrow.

johandahlberg commented 9 years ago

@vezzi Have you tested this yet? If so can you close the issue?

vezzi commented 9 years ago

tested!!!! Everything worked out as expected