TheJacksonLaboratory / LIRICAL

LIkelihood Ratio Interpretation of Clinical AbnormaLities
https://thejacksonlaboratory.github.io/LIRICAL/stable
Other
22 stars 11 forks source link

HPO + VCF : java.lang.ArrayIndexOutOfBoundsException: Index 9 out of bounds for length 7 #575

Closed tpYaki closed 1 year ago

tpYaki commented 2 years ago

Dear LIRICAL team, I have tried to input HPO + VCF in both yaml and Phenopacket form, in which the vcf file is the Pfeiffer.vcf from exomiser, and I also tried both LIRICAL.jar-1.3.4 and LIRICAL-1.3.2, but I repeatedly received the same error as follows:

java.lang.ArrayIndexOutOfBoundsException: Index 9 out of bounds for length 7 at org.monarchinitiative.phenol.annotations.assoc.GeneInfoParser.loadGeneIdToSymbolMap(GeneInfoParser.java:56) at org.monarchinitiative.phenol.annotations.assoc.Gene2DiseaseAssociationParser.parseMim2geneAndGeneInfo(Gene2DiseaseAssociationParser.java:95) at org.monarchinitiative.phenol.annotations.assoc.Gene2DiseaseAssociationParser.(Gene2DiseaseAssociationParser.java:54) at org.monarchinitiative.phenol.annotations.assoc.HpoAssociationParser.ingestDisease2GeneAssociations(HpoAssociationParser.java:240) at org.monarchinitiative.phenol.annotations.assoc.HpoAssociationParser.(HpoAssociationParser.java:89) at org.monarchinitiative.lirical.configuration.LiricalFactory.parseHpoAnnotations(LiricalFactory.java:357) at org.monarchinitiative.lirical.configuration.LiricalFactory.geneId2symbolMap(LiricalFactory.java:385) at org.monarchinitiative.lirical.configuration.LiricalFactory.getGene2GenotypeMap(LiricalFactory.java:481) at org.monarchinitiative.lirical.configuration.LiricalFactory.getGene2GenotypeMap(LiricalFactory.java:469) at org.monarchinitiative.lirical.cmd.PhenopacketCommand.runVcfAnalysis(PhenopacketCommand.java:102) at org.monarchinitiative.lirical.cmd.PhenopacketCommand.call(PhenopacketCommand.java:221) at org.monarchinitiative.lirical.cmd.PhenopacketCommand.call(PhenopacketCommand.java:33) at picocli.CommandLine.executeUserObject(CommandLine.java:1953) at picocli.CommandLine.access$1300(CommandLine.java:145) at picocli.CommandLine$RunLast.executeUserObjectOfLastSubcommandWithSameParent(CommandLine.java:2352) at picocli.CommandLine$RunLast.handle(CommandLine.java:2346) at picocli.CommandLine$RunLast.handle(CommandLine.java:2311) at picocli.CommandLine$AbstractParseResultHandler.execute(CommandLine.java:2179) at picocli.CommandLine.execute(CommandLine.java:2078) at org.monarchinitiative.lirical.Lirical.main(Lirical.java:39)

This is my seting: yaml: analysis: genomeAssembly: hg19 vcf: /Users/liyaqi/Downloads/LIRICAL-1.3.2/vcf/Pfeiffer.vcf datadir: /Users/liyaqi/Downloads/LIRICAL-1.3.2/data/ exomiser: /Users/liyaqi/Downloads/exomiser-cli-12.1.0/data/1909_hg19/ hpoIds: ['HP:0001156', 'HP:0001363', 'HP:0011304', 'HP:0010055'] prefix: test outdir: /Users/liyaqi/Downloads/LIRICAL-1.3.2/

My command for yaml: java -jar LIRICAL.jar yaml -y test.tml

Phenopacket: "subject": { "id": "example-1" }, "phenotypicFeatures": [{ "type": { "id": "HP:0000244", "label": "Turribrachycephaly" }, "classOfOnset": { "id": "HP:0003577", "label": "Congenital onset" } }, { "type": { "id": "HP:0000238", "label": "Hydrocephalus" }, "classOfOnset": { "id": "HP:0003577", "label": "Congenital onset" } }], "htsFiles": [{ "uri": "file:/Users/liyaqi/Downloads/LIRICAL-1.3.2/vcf/Pfeiffer.vcf", "description": "test", "htsFormat": "VCF", "genomeAssembly": "GRCh19", "individualToSampleIdentifiers": { "patient1": "NA12345" } }], "metaData": { "createdBy": "Peter R.", "resources": [{ "id": "hp", "name": "human phenotype ontology", "namespacePrefix": "HP", "url": "http://purl.obolibrary.org/obo/hp.owl", "version": "2018-03-08", "iriPrefix": "http://purl.obolibrary.org/obo/HP_" }] } }

My command for Phenopacket: java -jar LIRICAL.jar phenopacket -p /Users/liyaqi/Downloads/LIRICAL-1.3.2/example.json -d /Users/liyaqi/Downloads/LIRICAL-1.3.2/data -e /Users/liyaqi/Downloads/exomiser-cli-12.1.0/data/1909_hg19

When I input only HPO, both yaml and Phenopacket modes work well... (v1.3.2 v1.3.4) Help would be appreciated Thank you in advance! Yaki

tpYaki commented 2 years ago

files in 1909_hg19: 1909_hg19_clinvar_whitelist.tsv.gz 1909_hg19_clinvar_whitelist.tsv.gz.tbi 1909_hg19_genome.h2.db 1909_hg19_transcripts_ensembl.ser 1909_hg19_transcripts_refseq.ser 1909_hg19_transcripts_ucsc.ser 1909_hg19_variants.mv.db

files in LIRICAL-1.3.2/data: Homo_sapiens_gene_info.gz hp.obo mim2gene_medgen phenotype.hpoa

tpYaki commented 2 years ago

1909_hg19.zip is downloaded from here: https://data.monarchinitiative.org/exomiser/data/index.html There are only 7 files in the zip.

tpYaki commented 2 years ago

Hellow~ I finally solved this problem by reinstalling the LIRICAL data, and my Homo_sapiens_gene_info.gz was only 505KB before.

The files' sizes should be like these:

image
evatosco commented 2 years ago

@tpYaki thank you so much for sharing this issue! I had the same problem and would have never thought the cause was the size of that file. In my case, homo_sapiens_gene_info.gz was almost 13 MB. I still do not know why, but I downloaded the data again and now it is 3.2 MB, LIRICAL produces the output correctly. Thanks again!

Edit: Using openjdk 11.0.15, this warning shows, but it works after all: WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance. It does not show using openjdk version "1.8.0_152-release", in my experience (Ubuntu 20.04.4 LTS). I am currently using Exomiser 13.0.1 version in the pom.xml file, and Exomiser 2109 version database, it works correctly!

ielis commented 1 year ago

Hi @tpYaki sorry for a delayed response. It seems that incomplete data download was the source of the issue. Please feel free to re-open if you have any other questions. All the best, Daniel