exomiser / Exomiser

A Tool to Annotate and Prioritize Exome Variants
https://exomiser.readthedocs.io
GNU Affero General Public License v3.0
197 stars 54 forks source link

Fully-closed coordinates 1:249064688-249064690 out of contig bounds [1,248956422] #525

Closed GACGAMA closed 11 months ago

GACGAMA commented 11 months ago

Hi! I'm trying to run some one multisample (trio) vcf from WGS in genomizer. The files are aligned using the hg38 reference from Cavatica (Alias to broad-references/Homo_sapiens_assembly38.fasta on kfdrc-harmonization/kf_reference) - https://cavatica.sbgenomics.com/u/kfdrc-harmonization/kf-references/files/60639014357c3a53540ca7a3/

When trying to run genomizer, I get the following error:

2023-10-17 10:50:58.838  INFO 1454718 --- [           main] org.monarchinitiative.exomiser.cli.Main  : Starting Main using Java 19.0.1 on login03 with PID 1454718 (/scratch4/nsobrei2/programs/genomizer/exomiser-cli-13.2.0/exomiser-cli-13.2.0.jar started by ggama1 in /scratch4/nsobrei2/programs/genomizer/exomiser-cli-13.2.0)
2023-10-17 10:50:58.841  INFO 1454718 --- [           main] org.monarchinitiative.exomiser.cli.Main  : No active profile set, falling back to 1 default profile: "default"
2023-10-17 10:51:01.622  INFO 1454718 --- [           main] o.m.exomiser.cli.config.MainConfig       : Exomiser home: /scratch4/nsobrei2/programs/genomizer/exomiser-cli-13.2.0
2023-10-17 10:51:01.629  INFO 1454718 --- [           main] o.m.exomiser.cli.config.MainConfig       : Root data source directory set to: /scratch4/nsobrei2/programs/genomizer/exomiser-cli-13.2.0/data
2023-10-17 10:51:01.710  INFO 1454718 --- [           main] o.m.e.c.g.j.JannovarDataProtoSerialiser  : Deserialising Jannovar data from /scratch4/nsobrei2/programs/genomizer/exomiser-cli-13.2.0/data/2302_hg19/2302_hg19_transcripts_ensembl.ser
2023-10-17 10:51:06.030  INFO 1454718 --- [           main] o.m.e.c.g.j.JannovarDataProtoSerialiser  : Deserialisation took 4.32 sec.
2023-10-17 10:51:10.517  INFO 1454718 --- [           main] o.m.e.c.g.dao.VariantWhiteListLoader     : Loading variant whitelist from: /scratch4/nsobrei2/programs/genomizer/exomiser-cli-13.2.0/data/2302_hg19/2302_hg19_clinvar_whitelist.tsv.gz
2023-10-17 10:51:11.630  INFO 1454718 --- [           main] o.m.e.c.g.dao.VariantWhiteListLoader     : Loaded 180928 variants into whitelist
2023-10-17 10:51:11.832  INFO 1454718 --- [           main] o.m.e.a.genome.GenomeDataSourceLoader    : Opening CADD snv data from source: /scratch4/nsobrei2/references/CADD_v1-6_HG19/whole_genome_SNVs.tsv.gz
2023-10-17 10:51:12.126  INFO 1454718 --- [           main] o.m.e.a.genome.GenomeDataSourceLoader    : Opening CADD InDel data from source: /scratch4/nsobrei2/references/CADD_v1-6_HG19/InDels.tsv.gz
2023-10-17 10:51:12.415  INFO 1454718 --- [           main] o.m.e.a.genome.GenomeDataSourceLoader    : Opening REMM data from source: /scratch4/nsobrei2/references/REMM/v0-4/ReMM.v0.4.hg19.tsv.gz
2023-10-17 10:51:15.916  INFO 1454718 --- [           main] o.m.e.c.g.j.JannovarDataProtoSerialiser  : Deserialising Jannovar data from /scratch4/nsobrei2/programs/genomizer/exomiser-cli-13.2.0/data/2302_hg38/2302_hg38_transcripts_ensembl.ser
2023-10-17 10:51:17.222  INFO 1454718 --- [           main] o.m.e.c.g.j.JannovarDataProtoSerialiser  : Deserialisation took 1.305 sec.
2023-10-17 10:51:17.830  INFO 1454718 --- [           main] o.m.e.c.g.dao.VariantWhiteListLoader     : Loading variant whitelist from: /scratch4/nsobrei2/programs/genomizer/exomiser-cli-13.2.0/data/2302_hg38/2302_hg38_clinvar_whitelist.tsv.gz
2023-10-17 10:51:19.714  INFO 1454718 --- [           main] o.m.e.c.g.dao.VariantWhiteListLoader     : Loaded 180991 variants into whitelist
2023-10-17 10:51:19.814  INFO 1454718 --- [           main] o.m.e.a.genome.GenomeDataSourceLoader    : Opening CADD snv data from source: /scratch4/nsobrei2/references/CADD_vep_110/version_1_6/whole_genome_SNVs.tsv.gz
2023-10-17 10:51:19.911  INFO 1454718 --- [           main] o.m.e.a.genome.GenomeDataSourceLoader    : Opening CADD InDel data from source: /scratch4/nsobrei2/references/CADD_vep_110/version_1_6/gnomad.genomes.r3.0.indel.tsv.gz
2023-10-17 10:51:19.945  INFO 1454718 --- [           main] o.m.e.a.genome.GenomeDataSourceLoader    : Opening REMM data from source: /scratch4/nsobrei2/references/REMM/v0-4/ReMM.v0.4.hg38.tsv.gz
2023-10-17 10:51:20.524  INFO 1454718 --- [           main] g.GenomeAnalysisServiceAutoConfiguration : Configured hg19 genome analysis service
2023-10-17 10:51:20.524  INFO 1454718 --- [           main] g.GenomeAnalysisServiceAutoConfiguration : Configured hg38 genome analysis service
2023-10-17 10:51:25.309  INFO 1454718 --- [           main] o.m.exomiser.cli.config.MainConfig       : Default results directory set to: /scratch4/nsobrei2/programs/genomizer/exomiser-cli-13.2.0/results
2023-10-17 10:51:25.321  INFO 1454718 --- [           main] o.m.e.a.ExomiserConfigReporter           : exomiser.data-directory: /scratch4/nsobrei2/programs/genomizer/exomiser-cli-13.2.0/data
2023-10-17 10:51:25.321  INFO 1454718 --- [           main] o.m.e.a.ExomiserConfigReporter           : exomiser.hg19.data-version: 2302
2023-10-17 10:51:25.321  INFO 1454718 --- [           main] o.m.e.a.ExomiserConfigReporter           : exomiser.hg38.data-version: 2302
2023-10-17 10:51:25.321  INFO 1454718 --- [           main] o.m.e.a.ExomiserConfigReporter           : exomiser.phenotype.data-version: 2302
2023-10-17 10:51:25.819  INFO 1454718 --- [           main] org.monarchinitiative.exomiser.cli.Main  : Started Main in 28.307 seconds (JVM running for 29.296)
2023-10-17 10:51:27.913  INFO 1454718 --- [           main] o.m.e.cli.ExomiserCommandLineRunner      : Exomiser running...
2023-10-17 10:51:27.929  INFO 1454718 --- [           main] o.m.exomiser.core.Exomiser               : Running analysis using hg38 assembly with mode: PASS_ONLY
2023-10-17 10:51:27.931  INFO 1454718 --- [           main] o.m.e.c.analysis.AbstractAnalysisRunner  : Validating sample input data
2023-10-17 10:51:28.110  INFO 1454718 --- [           main] o.m.e.c.analysis.AbstractAnalysisRunner  : Running analysis for proband BH6074_1 (sample 1 in VCF) from samples: [BH6074_1, BH6074_2, BH6074_3]. Using coordinates for genome assembly hg38.
2023-10-17 10:51:28.711  INFO 1454718 --- [           main] o.m.e.c.analysis.AbstractAnalysisRunner  : Filtering variants with:
2023-10-17 10:51:28.711  INFO 1454718 --- [           main] o.m.e.c.analysis.AbstractAnalysisRunner  : FailedVariantFilter{}
2023-10-17 10:51:28.711  INFO 1454718 --- [           main] o.m.e.c.analysis.AbstractAnalysisRunner  : VariantEffectFilter{offTargetVariantTypes=[CODING_TRANSCRIPT_INTRON_VARIANT, FIVE_PRIME_UTR_EXON_VARIANT, THREE_PRIME_UTR_EXON_VARIANT, FIVE_PRIME_UTR_INTRON_VARIANT, THREE_PRIME_UTR_INTRON_VARIANT, NON_CODING_TRANSCRIPT_EXON_VARIANT, NON_CODING_TRANSCRIPT_INTRON_VARIANT, UPSTREAM_GENE_VARIANT, DOWNSTREAM_GENE_VARIANT, INTERGENIC_VARIANT, REGULATORY_REGION_VARIANT]}
2023-10-17 10:51:28.712  INFO 1454718 --- [           main] o.m.e.c.analysis.AbstractAnalysisRunner  : FrequencyFilter{maxFreq=2.0}
2023-10-17 10:51:28.712  INFO 1454718 --- [           main] o.m.e.c.analysis.AbstractAnalysisRunner  : Wrapping FrequencyFilter{maxFreq=2.0} with VariantDataProvider for sources [THOUSAND_GENOMES, TOPMED, UK10K, ESP_AFRICAN_AMERICAN, ESP_EUROPEAN_AMERICAN, ESP_ALL, EXAC_AFRICAN_INC_AFRICAN_AMERICAN, EXAC_AMERICAN, EXAC_EAST_ASIAN, EXAC_FINNISH, EXAC_NON_FINNISH_EUROPEAN, EXAC_OTHER, EXAC_SOUTH_ASIAN, GNOMAD_E_AFR, GNOMAD_E_AMR, GNOMAD_E_EAS, GNOMAD_E_FIN, GNOMAD_E_NFE, GNOMAD_E_OTH, GNOMAD_E_SAS, GNOMAD_G_AFR, GNOMAD_G_AMR, GNOMAD_G_EAS, GNOMAD_G_FIN, GNOMAD_G_NFE, GNOMAD_G_OTH, GNOMAD_G_SAS]
2023-10-17 10:51:28.713  INFO 1454718 --- [           main] o.m.e.c.analysis.AbstractAnalysisRunner  : PathogenicityFilter{keepNonPathogenic=true}
2023-10-17 10:51:28.713  INFO 1454718 --- [           main] o.m.e.c.analysis.AbstractAnalysisRunner  : Wrapping PathogenicityFilter{keepNonPathogenic=true} with VariantDataProvider for sources [REVEL, MVP]
2023-10-17 10:51:28.714  INFO 1454718 --- [           main] o.m.e.core.genome.VariantFactoryImpl     : Annotating variant records, trimming sequences and normalising positions...
2023-10-17 10:52:10.126  INFO 1454718 --- [           main] o.m.e.core.genome.VariantFactoryImpl     : Processed 1155618 variant records into 71736 single allele variants (including 0 structural variants)
2023-10-17 10:52:10.127  INFO 1454718 --- [           main] o.m.e.core.genome.VariantFactoryImpl     : Variant annotation finished in 0m 41s 412ms (41412 ms)
2023-10-17 10:52:10.129  INFO 1454718 --- [           main] ConditionEvaluationReportLoggingListener :

Error starting ApplicationContext. To display the conditions report re-run your application with 'debug' enabled.
2023-10-17 10:52:10.214 ERROR 1454718 --- [           main] o.s.boot.SpringApplication               : Application run failed

java.lang.IllegalStateException: Failed to execute CommandLineRunner
        at org.springframework.boot.SpringApplication.callRunner(SpringApplication.java:771)
        at org.springframework.boot.SpringApplication.callRunners(SpringApplication.java:752)
        at org.springframework.boot.SpringApplication.run(SpringApplication.java:314)
        at org.springframework.boot.SpringApplication.run(SpringApplication.java:1303)
        at org.springframework.boot.SpringApplication.run(SpringApplication.java:1292)
        at org.monarchinitiative.exomiser.cli.Main.main(Main.java:53)
Caused by: org.monarchinitiative.svart.CoordinatesOutOfBoundsException: Fully-closed coordinates 1:249064688-249064690 out of contig bounds [1,248956422]
        at org.monarchinitiative.svart.Coordinates.validateCoordinates(Coordinates.java:197)
        at org.monarchinitiative.svart.BaseGenomicRegion.<init>(BaseGenomicRegion.java:23)
        at org.monarchinitiative.svart.BaseVariant.<init>(BaseVariant.java:20)
        at org.monarchinitiative.svart.impl.DefaultVariant.<init>(DefaultVariant.java:8)
        at org.monarchinitiative.svart.impl.DefaultVariant.of(DefaultVariant.java:23)
        at org.monarchinitiative.svart.Variant.of(Variant.java:85)
        at org.monarchinitiative.svart.util.VcfConverter.convert(VcfConverter.java:60)
        at org.monarchinitiative.exomiser.core.genome.VariantContextConverter.convertToVariant(VariantContextConverter.java:106)
        at org.monarchinitiative.exomiser.core.genome.VariantFactoryImpl.buildVariantEvaluations(VariantFactoryImpl.java:161)
        at org.monarchinitiative.exomiser.core.genome.VariantFactoryImpl.lambda$buildAlleleVariantEvaluations$1(VariantFactoryImpl.java:110)
        at java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:197)
        at java.base/java.util.ArrayList$SubList$2.forEachRemaining(ArrayList.java:1481)
        at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:509)
        at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:499)
        at java.base/java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150)
        at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173)
        at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
        at java.base/java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:596)
        at java.base/java.util.stream.ReferencePipeline$7$1.accept(ReferencePipeline.java:276)
        at java.base/java.util.stream.ReferencePipeline$15$1.accept(ReferencePipeline.java:541)
        at java.base/java.util.Iterator.forEachRemaining(Iterator.java:133)
        at java.base/java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1921)
        at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:509)
        at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:499)
        at java.base/java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:921)
        at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
        at java.base/java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:682)
        at org.monarchinitiative.exomiser.core.analysis.AbstractAnalysisRunner.loadAndFilterVariants(AbstractAnalysisRunner.java:208)
        at org.monarchinitiative.exomiser.core.analysis.AbstractAnalysisRunner.run(AbstractAnalysisRunner.java:113)
        at org.monarchinitiative.exomiser.core.Exomiser.run(Exomiser.java:83)
        at org.monarchinitiative.exomiser.core.Exomiser.run(Exomiser.java:69)
        at org.monarchinitiative.exomiser.cli.ExomiserCommandLineRunner.runJob(ExomiserCommandLineRunner.java:79)
        at org.monarchinitiative.exomiser.cli.ExomiserCommandLineRunner.runJobs(ExomiserCommandLineRunner.java:62)
        at org.monarchinitiative.exomiser.cli.ExomiserCommandLineRunner.run(ExomiserCommandLineRunner.java:57)
        at org.springframework.boot.SpringApplication.callRunner(SpringApplication.java:768)
        ... 5 common frames omitted

I'm running genomizer on a HPC with oracle java 19 I'm using version 2302 of exomizer hg38 data

GACGAMA commented 11 months ago

Was loading mixed vcf files (hg 19 + hg38) on a multisample vcf file.