broadinstitute / gatk

Official code repository for GATK versions 4 and up
https://software.broadinstitute.org/gatk
Other
1.69k stars 588 forks source link

CollectGcBiasMetrics Array Index Out Of Bounds Exception #6372

Closed knoblett closed 4 years ago

knoblett commented 4 years ago

User Report:

Link: https://gatk.broadinstitute.org/hc/en-us/community/posts/360055990891-CollectGcBiasMetrics-Array-Index-Out-Of-Bounds-Exception

Hello,

When running CollectGcBiasMetrics on a moderately sized sam file (~500Mb), picard gives ArrayIndexOutOfBoundsException and "Exception counting mismatches for read ..."

The SCAN_WINDOW_SIZE=1000. When it's set to default value 100, the error message is slightly different but ArrayIndexOutOfBoundsException persists. I have also experimented with different window sizes, all values >1000 give same error at the same read on chrX (details below).

The reference fasta file is taken from UCSC: https://hgdownload.soe.ucsc.edu/goldenPath/hg19/bigZips/hg19.fa.gz

Any feedback leading to resolving the issue is greatly appreciated.

a) Picard version:

2.21.6-SNAPSHOT

b) Command script:

java -jar picard.jar CollectGcBiasMetrics \

I=sorted.sam \

O=gc_bias_metrics.txt \

CHART=gc_bias_metrics.pdf \

S=summary_metrics.txt \

R=hg19.fa \

SCAN_WINDOW_SIZE=1000

c) Error log:

MINIMUM_GENOME_FRACTION=1.0E-5 IS_BISULFITE_SEQUENCED=false METRIC_ACCUMULATION_LEVEL=[ALL_READS] ALSO_IGNORE_DUPLICATES=false ASSUME_SORTED=true STOP_AFTER=0 VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json USE_JDK_DEFLATER=false USE_JDK_INFLATER=false

[Tue Jan 07 16:48:19 PST 2020] Executing as akoch@hpc5-0-3.local on Linux 2.6.32-431.11.2.el6.x86_64 amd64; OpenJDK 64-Bit Server VM 1.8.0_181-b13; Deflater: Intel; Inflater: Intel; Provider GCS is not available; Picard version: 2.21.6-SNAPSHOT

INFO 2020-01-07 16:51:24 SinglePassSamProgram Processed 1,000,000 records. Elapsed time: 00:00:33s. Time for last 1,000,000: 27s. Last read position: chr5:92,832,908

INFO 2020-01-07 16:51:53 SinglePassSamProgram Processed 2,000,000 records. Elapsed time: 00:01:01s. Time for last 1,000,000: 28s. Last read position: chr11:121,228,669

[Tue Jan 07 16:52:25 PST 2020] picard.analysis.CollectGcBiasMetrics done. Elapsed time: 4.10 minutes.

Runtime.totalMemory()=4236247040

To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp

Exception in thread "main" htsjdk.samtools.SAMException: Exception counting mismatches for read XXXXXXXX0434501/1 32b aligned to chrX:51305151-51305182.

at htsjdk.samtools.util.SequenceUtil.countMismatches(SequenceUtil.java:490)

at htsjdk.samtools.util.SequenceUtil.countMismatches(SequenceUtil.java:466)

at htsjdk.samtools.util.SequenceUtil.countMismatches(SequenceUtil.java:504)

at picard.analysis.GcBiasMetricsCollector.addRead(GcBiasMetricsCollector.java:389)

at picard.analysis.GcBiasMetricsCollector.access$600(GcBiasMetricsCollector.java:48)

at picard.analysis.GcBiasMetricsCollector$PerUnitGcBiasMetricsCollector.addReadToGcData(GcBiasMetricsCollector.java:221)

at picard.analysis.GcBiasMetricsCollector$PerUnitGcBiasMetricsCollector.acceptRecord(GcBiasMetricsCollector.java:155)

at picard.analysis.GcBiasMetricsCollector$PerUnitGcBiasMetricsCollector.acceptRecord(GcBiasMetricsCollector.java:100)

at picard.metrics.MultiLevelCollector$AllReadsDistributor.acceptRecord(MultiLevelCollector.java:192)

at picard.metrics.MultiLevelCollector.acceptRecord(MultiLevelCollector.java:315)

at picard.analysis.CollectGcBiasMetrics.acceptRead(CollectGcBiasMetrics.java:172)

at picard.analysis.SinglePassSamProgram.makeItSo(SinglePassSamProgram.java:158)

at picard.analysis.SinglePassSamProgram.doWork(SinglePassSamProgram.java:94)

at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:305)

at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:103)

at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:113)

Caused by: java.lang.ArrayIndexOutOfBoundsException: 51305150

at htsjdk.samtools.util.SequenceUtil.countMismatches(SequenceUtil.java:482)

... 15 more

(created from Zendesk ticket #4315)
gz#4315

fspeters1 commented 4 years ago

Hi, I am running into a similar problem and was wondering if you have found the solution for this? Thanks!

cmnbroad commented 4 years ago

I moved this ticket over to the Picard repo since this is a Picard tool. Closing in favor of https://github.com/broadinstitute/picard/issues/1532.

cmnbroad commented 4 years ago

@fspeters1 I noticed that this ticket is in the wrong repository (it should be in Picard since its about a Picard tool), so you can post/follow the new ticket https://github.com/broadinstitute/picard/issues/1532 there.