broadinstitute / picard

A set of command line tools (in Java) for manipulating high-throughput sequencing (HTS) data and formats such as SAM/BAM/CRAM and VCF.
https://broadinstitute.github.io/picard/
MIT License
974 stars 370 forks source link

CollectvariantCallingMetrics java.lang.NullPointerException for contigs not in the reference #1732

Open GATKSupportTeam opened 3 years ago

GATKSupportTeam commented 3 years ago

We are troubleshooting an issue from the GATK forum with CollectVariantCallingMetrics. There seems to be an issue with the user's dbSNP file but the stack trace does not give enough information so that the problem can be debugged easily.

This request was created from a contribution made by Azza Ahmed on October 05, 2021 09:36 UTC.

Link: https://gatk.broadinstitute.org/hc/en-us/community/posts/360074951711-No-results-of-CollectvariantCallingMetrics#community_comment_4407886398491

--

Hi!

I'm actually getting the same error (using gatk 4.2.1.0), only running with a gvcf file while setting INPUT_GVCF ture as below. While a lot of time is spent processing, no file is produced, and the tool fails with the same NullPointerException message:

$ gatk --java-options "-Xms2000m" CollectVariantCallingMetrics --INPUT mysample.g.vcf --OUTPUT mysample.g.vcf.metrics.txt --DBSNP GCF\_000001405.39\_ucsc-genbank-chrs.gz --SEQUENCE\_DICTIONARY Homo\_sapiens\_assembly38.dict --THREAD\_COUNT 8 --GVCF\_INPUT true >> mysample.merge\_gvcfs\_gatk.manual.log 2>&1

$ tail -n13 mysample.merge\_gvcfs\_gatk.manual.log

INFO 2021-10-04 12:59:14 CollectVariantCallingMetrics Read 1,091,000,000 variants. Elapsed time: 01:20:07s. Time for last 100,000: 0s. Last read position: chr22\_KB663609v1\_alt:51,502  
[Mon Oct 04 12:59:14 CEST 2021] picard.vcf.CollectVariantCallingMetrics done. Elapsed time: 80.18 minutes.  
Runtime.totalMemory()=4211081216  
To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp  
java.lang.NullPointerException  
at picard.util.DbSnpBitSetUtil.loadVcf(DbSnpBitSetUtil.java:163)  
at picard.util.DbSnpBitSetUtil.createSnpAndIndelBitSets(DbSnpBitSetUtil.java:131)  
at picard.vcf.CollectVariantCallingMetrics.doWork(CollectVariantCallingMetrics.java:101)  
at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:308)  
at org.broadinstitute.hellbender.cmdline.PicardCommandLineProgramExecutor.instanceMain(PicardCommandLineProgramExecutor.java:37)  
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)  
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)  
at org.broadinstitute.hellbender.Main.main(Main.java:289)

# 

The input file is valid though, and no error/warning arose when checked as per the suggestion here,

$ gatk --java-options "-Xmx4G" ValidateVariants -V mysample.g.vcf -R Homo\_sapiens\_assembly38.fasta --validation-type-to-exclude ALLELES

Is it possible that the INPUT_GVCF filter is not activated? Any kind of help is appreciated.

Thank you much.

Best,

Azza

(created from Zendesk ticket #198527)
gz#198527

kockan commented 1 year ago

Closing this issue for now due to time passed since open date. Please feel free to reopen if still relevant.

lbergelson commented 1 year ago

@kockan This still seems like a bug too me.

kockan commented 1 year ago

@lbergelson Ok, it looked like this was resolved in the relevant gatk forum post, but I'll reopen it in that case

lbergelson commented 1 year ago

My feeling is that an NPE is always a bug. Maybe it should just be a different exception, but we might as well track it.