broadinstitute / gatk

Official code repository for GATK versions 4 and up
https://software.broadinstitute.org/gatk
Other
1.72k stars 594 forks source link

Funcotator Exception: String index out of range #6651

Open GATKSupportTeam opened 4 years ago

GATKSupportTeam commented 4 years ago

This request was created from a contribution made by Mark Godek on May 28, 2020 12:43 UTC.

Link: https://gatk.broadinstitute.org/hc/en-us/community/posts/360067471451-Funcotator-cannot-complete-funcotaion-for-variant-due-to-alternate-allele

--

I'm attempting to annotate germline variants after VQSR with Funcotator using GATK 4.1.4.1.

GATK command is:

gatk Funcotator \
-R ${REFERENCE_GENOME} \
-V ${OUT}/germline.filtered.vcf.gz \
-O ${OUT}/annotated.germline.vcf \
--output-file-format VCF \
--data-sources-path /mnt/data/rbueno/analysis_files/MedGenome_FamilialMPMs/Annotation_data_sources/funcotator_dataSources.v1.6.20190124s \
--ref-version hg19

I get many warnings and it terminates with a String index out of range error. Any help is appreciated.

 

The tail end of the output follows:

07:33:14.569 WARN GencodeFuncotationFactory - Cannot create complete funcotation for variant at chr12:69756762-69756762 due to alternate allele: *
07:33:14.575 WARN GencodeFuncotationFactory - Cannot create complete funcotation for variant at chr12:69756763-69756763 due to alternate allele: *
07:33:14.575 WARN GencodeFuncotationFactory - Cannot create complete funcotation for variant at chr12:69756763-69756763 due to alternate allele: *
07:33:14.580 WARN GencodeFuncotationFactory - Cannot create complete funcotation for variant at chr12:69756764-69756764 due to alternate allele: *
07:33:14.580 WARN GencodeFuncotationFactory - Cannot create complete funcotation for variant at chr12:69756764-69756764 due to alternate allele: *
07:33:16.681 WARN GencodeFuncotationFactory - Cannot create complete funcotation for variant at chr12:70289137-70289137 due to alternate allele: *
07:33:16.681 WARN GencodeFuncotationFactory - Cannot create complete funcotation for variant at chr12:70289137-70289137 due to alternate allele: *
07:33:17.957 INFO VcfFuncotationFactory - dbSNP 9606_b150 cache hits/total: 521/453691
07:33:18.138 INFO Funcotator - Shutting down engine
[May 28, 2020 7:33:18 AM EDT] org.broadinstitute.hellbender.tools.funcotator.Funcotator done. Elapsed time: 34.35 minutes.
Runtime.totalMemory()=3822059520
java.lang.StringIndexOutOfBoundsException: String index out of range: 545
at java.lang.String.substring(String.java:1963)
at org.broadinstitute.hellbender.tools.funcotator.ProteinChangeInfo.initializeForInsertion(ProteinChangeInfo.java:256)
at org.broadinstitute.hellbender.tools.funcotator.ProteinChangeInfo.(ProteinChangeInfo.java:93)
at org.broadinstitute.hellbender.tools.funcotator.ProteinChangeInfo.create(ProteinChangeInfo.java:371)
at org.broadinstitute.hellbender.tools.funcotator.dataSources.gencode.GencodeFuncotationFactory.createSequenceComparison(GencodeFuncotationFactory.java:2003)
at org.broadinstitute.hellbender.tools.funcotator.dataSources.gencode.GencodeFuncotationFactory.createCodingRegionFuncotationForProteinCodingFeature(GencodeFuncotationFactory.java:1193)
at org.broadinstitute.hellbender.tools.funcotator.dataSources.gencode.GencodeFuncotationFactory.createExonFuncotation(GencodeFuncotationFactory.java:1044)
at org.broadinstitute.hellbender.tools.funcotator.dataSources.gencode.GencodeFuncotationFactory.createGencodeFuncotationOnSingleTranscript(GencodeFuncotationFactory.java:978)
at org.broadinstitute.hellbender.tools.funcotator.dataSources.gencode.GencodeFuncotationFactory.createFuncotationsHelper(GencodeFuncotationFactory.java:805)
at org.broadinstitute.hellbender.tools.funcotator.dataSources.gencode.GencodeFuncotationFactory.createFuncotationsHelper(GencodeFuncotationFactory.java:789)
at org.broadinstitute.hellbender.tools.funcotator.dataSources.gencode.GencodeFuncotationFactory.lambda$createGencodeFuncotationsByAllTranscripts$0(GencodeFuncotationFactory.java:474)
at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1374)
at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
at org.broadinstitute.hellbender.tools.funcotator.dataSources.gencode.GencodeFuncotationFactory.createGencodeFuncotationsByAllTranscripts(GencodeFuncotationFactory.java:475)
at org.broadinstitute.hellbender.tools.funcotator.dataSources.gencode.GencodeFuncotationFactory.createFuncotationsOnVariant(GencodeFuncotationFactory.java:530)
at org.broadinstitute.hellbender.tools.funcotator.DataSourceFuncotationFactory.determineFuncotations(DataSourceFuncotationFactory.java:233)
at org.broadinstitute.hellbender.tools.funcotator.DataSourceFuncotationFactory.createFuncotations(DataSourceFuncotationFactory.java:201)
at org.broadinstitute.hellbender.tools.funcotator.DataSourceFuncotationFactory.createFuncotations(DataSourceFuncotationFactory.java:172)
at org.broadinstitute.hellbender.tools.funcotator.FuncotatorEngine.lambda$createFuncotationMapForVariant$0(FuncotatorEngine.java:147)
at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175)
at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1374)
at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
at org.broadinstitute.hellbender.tools.funcotator.FuncotatorEngine.createFuncotationMapForVariant(FuncotatorEngine.java:157)
at org.broadinstitute.hellbender.tools.funcotator.Funcotator.enqueueAndHandleVariant(Funcotator.java:903)
at org.broadinstitute.hellbender.tools.funcotator.Funcotator.apply(Funcotator.java:857)
at org.broadinstitute.hellbender.engine.VariantWalker.lambda$traverse$0(VariantWalker.java:104)
at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184)
at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175)
at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
at java.util.Iterator.forEachRemaining(Iterator.java:116)
at java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
at java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:151)
at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:174)
at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:418)
at org.broadinstitute.hellbender.engine.VariantWalker.traverse(VariantWalker.java:102)
at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:1048)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:139)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:191)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:210)
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:163)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:206)
at org.broadinstitute.hellbender.Main.main(Main.java:292)


(created from Zendesk ticket #5792)
gz#5792

jonn-smith commented 4 years ago

The warnings the user is seeing are due to spanning deletion alleles which are currently not annotated with Funcotator. The bug here is what is causing the stack trace.

It's in the protein sequence prediction code and I suspect that it has to do with the position of the variant relative to the exon/transcript boundaries.

I have not been able to look at it yet, but thanks to the user posting the variants that are causing issues, it should be straight-forward to track down.

twood1 commented 3 years ago

Was this issue ever resolved, or was the problem clearly identified? I am currently experiencing this error, but any help would be appreciated.

jonn-smith commented 3 years ago

@twood1 This is still an open issue, but I know where in the code it's happening and what is going on. I just haven't had time to debug it. For now a workaround is to remove the variant causing the failure from your file. You can find this by looking at the variants that Funcotator outputs - the variant after the final output entry will be the one causing this failure.

twood1 commented 3 years ago

@jonn-smith Thanks for the prompt response jonn - is the code for the surrounding issue(s) open source? If so, could you point me towards the file?

jonn-smith commented 3 years ago

@twood1 No prob. Yup - it's all open source, but this particular part of the code may be a bit tricky to debug (which is why I haven't gotten to it yet).

The issue is happening in org.broadinstitute.hellbender.tools.funcotator.ProteinChangeInfo but the problem is upstream of that when I'm extracting the sequence information from the reference to create the protein change strings.

Feel free to take a look, but this is one of my top priorities for bugs to fix next.

twood1 commented 3 years ago

So the issue you are describing is essentially completely independent from input parameters/options, minus the reference fasta and the input VCF. Is that correct?

jonn-smith commented 3 years ago

Correct - though it also depends on the Gencode data source which is tied to the reference.

It really pulls the protein change info from the gencode transcript sequence, which is at the core of the issue.

xmzhuo commented 3 years ago

I have similar issue.

java.lang.StringIndexOutOfBoundsException: String index out of range: -2 at org.broadinstitute.hellbender.tools.funcotator.dataSources.gencode.GencodeFuncotationFactory.createAndFilterGencodeFuncotationsByTranscript(GencodeFuncotationFactory.java:281) at org.broadinstitute.hellbender.tools.funcotator.dataSources.gencode.GencodeFuncotationFactory.createFuncotationsOnVariant(GencodeFuncotationFactory.java:338) at org.broadinstitute.hellbender.tools.funcotator.DataSourceFuncotationFactory.createFuncotations(DataSourceFuncotationFactory.java:138) at org.broadinstitute.hellbender.tools.funcotator.DataSourceFuncotationFactory.createFuncotations(DataSourceFuncotationFactory.java:113) at org.broadinstitute.hellbender.tools.funcotator.Funcotator.lambda$enqueueAndHandleVariant$0(Funcotator.java:502) at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193) at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175) at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1374) at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481) at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471) at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708) at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499) at org.broadinstitute.hellbender.tools.funcotator.Funcotator.enqueueAndHandleVariant(Funcotator.java:504) at org.broadinstitute.hellbender.tools.funcotator.Funcotator.apply(Funcotator.java:399) at org.broadinstitute.hellbender.engine.VariantWalkerBase.lambda$traverse$0(VariantWalkerBase.java:109) at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184) at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175) at java.util.Iterator.forEachRemaining(Iterator.java:116) at java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801) at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481) at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471) at java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:151) at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:174) at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:418) at org.broadinstitute.hellbender.engine.VariantWalkerBase.traverse(VariantWalkerBase.java:107) at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:994) at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:135) at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:180) at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:199) at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160) at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203) at org.broadinstitute.hellbender.Main.main(Main.java:289)

The annotation stop at chr11 34357581. The output also truncated after this position.

chr11 34357581 . C CGGGACGTACAGCTCGACTCTGAAGACGCTGGAGGACTTGACCTTGGACTCCGGGT .
PASS DP=208;ECNT=2;NLOD=8.8;N_ART_LOD=-1.486;POP_AF=2.5e-06;P_CONTAM=2.202e-10;P_GERMLINE=-51.18;TLOD=11.12
GT:AD:AF:F1R2:F2R1:MBQ:MFRL:MMQ:MPOS:OBAM:OBAMRC:PGT:PID:SA_MAP_AF:SA_POST_PROB 0/1:171,5:0.033:94,3:77,2:37:329 ,212:60:5:false:false:0|1:34357577_C_CCAT:0.02,0.02,0.028:0.0054,0.004127,0.99 0/0:29,0:0.014:13,0:16,0:0:340,0 :0:0:false:false:0|1:34357577_C_CCAT:.:.

At first I thought it may be due to the length of the indel, but funcotator seems working alright before that position (some of them even longer than chr11 34357581) such as chr10 123715082 . A ATCACTGCTGCCACTCACTCGGGTCACCTGCTGCTCCACGTGGCCCAGAGCTTCTGT .
PASS DP=196;ECNT=2;NLOD=7.6;N_ART_LOD=-1.425;POP_AF=2.5e-06;P_CONTAM=1.663e-10;P_GERMLINE=-47.68;TLOD=11.24
GT:AD:AF:F1R2:F2R1:MBQ:MFRL:MMQ:MPOS:OBAM:OBAMRC:PGT:PID:SA_MAP_AF:SA_POST_PROB 0/1:163,5:0.034:81,3:82,2:37:326 ,367:60:6:false:false:0|1:123715081_A_C:0.01,0.03,0.03:0.02,0.002271,0.978 0/0:25,0:0.016:13,0:12,0:0:292,0 :0:0:false:false:0|1:123715081_A_C:.:. chr11 707740 . C CGAAGGCCAGGAACCTGGCCTTCCCCTGGGGGCACGCAAACATGGAGGGCTGTGACACGCGACCCCCCTGGG
. PASS DP=181;ECNT=1;NLOD=7.17;N_ART_LOD=-1.413;POP_AF=2.5e-06;P_CONTAM=1.459e-05;P_GERMLINE=-33.07;TLO D=5.79 GT:AD:AF:F1R2:F2R1:MBQ:MFRL:MMQ:MPOS:OBAM:OBAMRC:SA_MAP_AF:SA_POST_PROB 0/1:115,3:0.115:50,1:65,2:37:314 ,321:60:5:false:false:0.02,0.02,0.025:0.004925,0.006118,0.989 0/0:23,0:0.046:12,0:11,0:0:288,0:0:0:false:false :.:.

Two weeks ago, I have another sample stop at chr 7 with java.lang.StringIndexOutOfBoundsException: String index out of range: 1383 I guess these are related.

jonn-smith commented 3 years ago

@xmzhuo Interesting. Is this hg19 or hg38 data? I can add this to our tests.

For everyone else - thanks for your patience. I'm starting to work on this issue this week so we should have a fix relatively soon (1-2 weeks).

xmzhuo commented 3 years ago

Hg38

On Wed, Mar 31, 2021, 11:51 Jonn Smith @.***> wrote:

@xmzhuo https://github.com/xmzhuo Interesting. Is this hg19 or hg38 data? I can add this to our tests.

For everyone else - thanks for your patience. I'm starting to work on this issue this week so we should have a fix relatively soon (1-2 weeks).

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/broadinstitute/gatk/issues/6651#issuecomment-811174076, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADS37OVSNLHPC6EQLT4WEI3TGNAJHANCNFSM4NYY7SBQ .

daisyyr commented 3 years ago

Hi, everyone~ Is this problem solved now? It seems that I've encounted similiar problems. I'm using GATK4.2 and hg38 data.

11:43:25.661 ERROR GencodeFuncotationFactory - Problem creating a GencodeFuncotation on transcript ENST00000441716.2 for variant: chr6:167976552-167976594(ACAGTGGGGGTCATTCCCCCTGCAGTGTGTTGGGAGGAGGAGG -> A): Variant overlaps transcript but is not completely contained within it. Funcotator cannot currently handle this case. Transcript: ENST00000441716.2 Variant: [VC Unknown @ chr6:167976552-167976594 Q. of type=INDEL alleles=[ACAGTGGGGGTCATTCCCCCTGCAGTGTGTTGGGAGGAGGAGG, A] attr={AS_FilterStatus=SITE, AS_SB_TABLE=[43, 26|2, 2], DP=94, ECNT=1, GERMQ=93, MBQ=[31, 20], MFRL=[288, 110], MMQ=[60, 60], MPOS=56, NALOD=1.37, NLOD=6.17, POPAF=4.6, ROQ=93, TLOD=10.97} GT=GT:AD:AF:DP:F1R2:F2R1:SB 0/1:46,4:0.07:50:14,3:10,0:28,18,2,2 0/0:23,0:0.041:23:8,0:5,0:15,8,0,0 filters= 11:43:25.661 WARN GencodeFuncotationFactory - Creating default GencodeFuncotation on transcript ENST00000441716.2 for problem variant: chr6:167976552-167976594(ACAGTGGGGGTCATTCCCCCTGCAGTGTGTTGGGAGGAGGAGG -> A) 11:44:04.904 INFO ProgressMeter - chr8:677091 4.5 3000 666.0 11:45:35.226 INFO ProgressMeter - chr11:62279639 6.0 4000 665.6 11:46:54.284 INFO ProgressMeter - chr15:19905537 7.3 5000 682.4 11:48:12.767 WARN FuncotatorUtils - createAminoAcidSequence given a coding sequence of length not divisible by 3. Dropping bases from the end: 2 (size=293, ref allele: G) 11:48:16.949 ERROR GencodeFuncotationFactory - Problem creating a GencodeFuncotation on transcript ENST00000379751.5 for variant: chr20:3786474-3786537(TGGGGCCCATCCCGGCGCGCCCCCCGCCCCGGGGCCCGGCGCCGCCGCCGCCGCCCCGGGGCGG -> T): Cannot yet handle indels starting outside an exon and ending within an exon. 11:48:16.949 WARN GencodeFuncotationFactory - Creating default GencodeFuncotation on transcript ENST00000379751.5 for problem variant: chr20:3786474-3786537(TGGGGCCCATCCCGGCGCGCCCCCCGCCCCGGGGCCCGGCGCCGCCGCCGCCGCCCCGGGGCGG* -> T) 11:48:31.506 INFO ProgressMeter - chr21:18282114 8.9 6000 670.6 11:49:08.210 INFO ProgressMeter - chr21:18282114 9.6 6888 720.6 11:49:08.210 INFO ProgressMeter - Traversal complete. Processed 6888 total variants in 9.6 minutes. 11:49:08.210 INFO VcfFuncotationFactory - ClinVar_VCF 20180429_hg38 cache hits/total: 0/2 11:49:08.211 INFO VcfFuncotationFactory - dbSNP 9606_b151 cache hits/total: 0/4781 11:49:08.230 INFO Funcotator - Shutting down engine [July 7, 2021 11:49:08 AM GMT] org.broadinstitute.hellbender.tools.funcotator.Funcotator done. Elapsed time: 9.72 minutes. Runtime.totalMemory()=4879548416 Tool returned: true

gbrandt6 commented 3 years ago

@daisyyr Thanks for posting your example here, this issue is still open so it has not been fixed yet.

gbrandt6 commented 3 years ago

@xmzhuo @twood1 we have released a fix for a very similar bug in Funcotator (#6289 ). Could you test the newest GATK version 4.2.3.0 and let us know if it also solves this bug?

jkobject commented 2 years ago

@gbrandt6 this is the same as #6289 and as per my comment there. I still see the bug in gatk 4.2.6.1. It occurs rarely but breaks the pipelines.

jkobject commented 2 years ago

just to help with associating issues: here is the list of issues that seems to be talking about the same problem: #6651, #7523, #6345, #4307, #6546, #3749, #4804, #6289. Seems to exist since 2018.

jonn-smith commented 2 years ago

@jkobject This problem has to do with indels and predicted protein change sequences. I'm starting a refactor of how the predicted protein changes get created. When that's complete, this issue will be fixed.

In the meantime, can you post the stack trace and share the example workspace you mention in #6289 ?

jkobject commented 2 years ago

I can, this only happens on 10 of our 2000 samples (only in WES) none of our 600 WGS seems to have the same issue. It is always on some small contig (you can see here range is 544, but all cases are small ranges like this one).

Everything is the default mutect2 pipeline and params (e.g. gs://genomics-public-data/resources/broad/hg38/v0/Homo_sapiens_assembly38.fasta) : except the interval file: gs://ccleparams/region_file_wgs.list GATK 4.2.6.1.

Here is the VCF file to annotate gs://ccleparams/test/CDS-2jucw0.hg38-filtered.vcf.gz

Here is the stacktrace:

....
10:53:39.044 INFO VcfFuncotationFactory - ClinVar_VCF 20180429_hg38 cache hits/total: 0/2145
10:53:39.249 INFO VcfFuncotationFactory - dbSNP 9606_b151 cache hits/total: 0/1069225
10:53:39.520 INFO Funcotator - Shutting down engine
[July 12, 2022 10:53:39 AM GMT] org.broadinstitute.hellbender.tools.funcotator.Funcotator done. Elapsed time: 115.46 minutes.
Runtime.totalMemory()=2050490368
java.lang.StringIndexOutOfBoundsException: String index out of range: 544
at java.lang.String.substring(String.java:1963)
at org.broadinstitute.hellbender.tools.funcotator.ProteinChangeInfo.initializeForInsertion(ProteinChangeInfo.java:293)
at org.broadinstitute.hellbender.tools.funcotator.ProteinChangeInfo.<init>(ProteinChangeInfo.java:101)
at org.broadinstitute.hellbender.tools.funcotator.ProteinChangeInfo.create(ProteinChangeInfo.java:399)
at org.broadinstitute.hellbender.tools.funcotator.dataSources.gencode.GencodeFuncotationFactory.createSequenceComparison(GencodeFuncotationFactory.java:2054)
at org.broadinstitute.hellbender.tools.funcotator.dataSources.gencode.GencodeFuncotationFactory.createCodingRegionFuncotationForProteinCodingFeature(GencodeFuncotationFactory.java:1235)
at org.broadinstitute.hellbender.tools.funcotator.dataSources.gencode.GencodeFuncotationFactory.createExonFuncotation(GencodeFuncotationFactory.java:1083)
at org.broadinstitute.hellbender.tools.funcotator.dataSources.gencode.GencodeFuncotationFactory.createGencodeFuncotationOnSingleTranscript(GencodeFuncotationFactory.java:1020)
at org.broadinstitute.hellbender.tools.funcotator.dataSources.gencode.GencodeFuncotationFactory.createFuncotationsHelper(GencodeFuncotationFactory.java:847)
at org.broadinstitute.hellbender.tools.funcotator.dataSources.gencode.GencodeFuncotationFactory.createFuncotationsHelper(GencodeFuncotationFactory.java:831)
at org.broadinstitute.hellbender.tools.funcotator.dataSources.gencode.GencodeFuncotationFactory.lambda$createGencodeFuncotationsByAllTranscripts$0(GencodeFuncotationFactory.java:508)
at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382)
at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482)
at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472)
at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:566)
at org.broadinstitute.hellbender.tools.funcotator.dataSources.gencode.GencodeFuncotationFactory.createGencodeFuncotationsByAllTranscripts(GencodeFuncotationFactory.java:509)
at org.broadinstitute.hellbender.tools.funcotator.dataSources.gencode.GencodeFuncotationFactory.createFuncotationsOnVariant(GencodeFuncotationFactory.java:564)
at org.broadinstitute.hellbender.tools.funcotator.DataSourceFuncotationFactory.determineFuncotations(DataSourceFuncotationFactory.java:243)
at org.broadinstitute.hellbender.tools.funcotator.DataSourceFuncotationFactory.createFuncotations(DataSourceFuncotationFactory.java:211)
at org.broadinstitute.hellbender.tools.funcotator.DataSourceFuncotationFactory.createFuncotations(DataSourceFuncotationFactory.java:182)
at org.broadinstitute.hellbender.tools.funcotator.FuncotatorEngine.lambda$createFuncotationMapForVariant$0(FuncotatorEngine.java:152)
at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175)
at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382)
at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482)
at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472)
at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:566)
at org.broadinstitute.hellbender.tools.funcotator.FuncotatorEngine.createFuncotationMapForVariant(FuncotatorEngine.java:162)
at org.broadinstitute.hellbender.tools.funcotator.Funcotator.enqueueAndHandleVariant(Funcotator.java:924)
at org.broadinstitute.hellbender.tools.funcotator.Funcotator.apply(Funcotator.java:878)
at org.broadinstitute.hellbender.engine.VariantWalker.lambda$traverse$0(VariantWalker.java:104)
at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183)
at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175)
at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
at java.util.Iterator.forEachRemaining(Iterator.java:116)
at java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482)
at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472)
at java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150)
at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173)
at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:485)
at org.broadinstitute.hellbender.engine.VariantWalker.traverse(VariantWalker.java:102)
at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:1085)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:140)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:192)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:211)
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)
at org.broadinstitute.hellbender.Main.main(Main.java:289)
Using GATK jar /root/gatk.jar defined in environment variable GATK_LOCAL_JAR
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Xmx3500m -jar /root/gatk.jar Funcotator --data-sources-path /cromwell_root/datasources_dir --ref-version hg38 --output-file-format VCF -R gs://genomics-public-data/resources/broad/hg38/v0/Homo_sapiens_assembly38.fasta -V gs://fc-secure-d2a2d895-a7af-4117-bdc7-652d7d268324/94e769a1-28e1-4bd7-b09f-9e47fb7d8352/omics_mutect2/14fe5685-740c-4e09-9d1a-8c8d14c0ae5b/call-mutect2/Mutect2/2de52f4f-eea0-4ec7-acc1-f47b1a2d1e6c/call-Filter/attempt-2/CDS-2jucw0.hg38-filtered.vcf.gz -O CDS-2jucw0.hg38-filtered.vcf.gz.annotated.vcf.gz -L /cromwell_root/ccleparams/region_file_wgs.list --annotation-default normal_barcode: --annotation-default tumor_barcode:NP5 --annotation-default Center:DEPMAP --annotation-default source:Unknown
jonn-smith commented 2 years ago

@jkobject OK, thanks!

jkobject commented 2 years ago

my quickfix was to reduce the intervals to target regions of my WES (instead of using the full genome region) and give it to funcotator. Remark: The GATK mutect2 WDL does not give the default intervals to funcotator, only to mutect2.

jkobject commented 2 years ago

After running it on all my samples it actually only solved half of them... I will look into the try/catch fix

fmarce753 commented 1 year ago

Hi everyone! i partially solved the problem "WARN GencodeFuncotationFactory - Cannot create complete funcotation for variant at chr:__: due to alternate allele: ". The origin of the problem is that we have complex datasets that contain more than one sample. In the set of samples, more than one alternative allele is detected, including the "". The idea is to have one line for each variant because, apparently, Funcotator reads it properly. I applied the following commands and it worked perfectly:

1) Normalize: bcftools norm -m - cohort.vcf > cohort_norm.vcf

2) Select SNPs (I haven't tried it for indels yet) gatk SelectVariants -R hg38.fa -V "cohort_norm.vcf" --select-type SNP -O "cohort_snp.vcf.gz"

3) Remove the variants remaining: awk -F'\t' '$5 != ""' cohort_snp.vcf > filtered_cohort_snp.vcf

4) Apply Funcotator.

At the moment this works perfectly for me. If anyone has a better solution please upload it.

Regards