exomiser / Exomiser

A Tool to Annotate and Prioritize Exome Variants
https://exomiser.readthedocs.io
GNU Affero General Public License v3.0
190 stars 54 forks source link

Exomiser freezes after a few seconds - help! #455

Closed nvadgama87 closed 1 year ago

nvadgama87 commented 1 year ago

Hi there,

I followed the installation instructions and when run Exomiser on the test data, it freezes after the 'Deserialisation took XX secs' step:

 A Tool to Annotate and Prioritize Exome Variants     v13.0.0

2022-10-04 19:56:33.376  INFO 964 --- [           main] org.monarchinitiative.exomiser.cli.Main  : Starting Main using Java 17.0.4 on sh03-08n12.int with PID 964 (/oak/stanford/groups/caseyg21/Nirmal/Exomiser/exomiser-cli-13.0.0/exomiser-cli-13.0.0.jar started by nvadgama in /oak/stanford/groups/caseyg21/Nirmal/Exomiser/exomiser-cli-13.0.0)
2022-10-04 19:56:33.378  INFO 964 --- [           main] org.monarchinitiative.exomiser.cli.Main  : No active profile set, falling back to default profiles: default
2022-10-04 19:56:33.877  INFO 964 --- [           main] o.m.exomiser.cli.config.MainConfig       : Exomiser home: /oak/stanford/groups/caseyg21/Nirmal/Exomiser/exomiser-cli-13.0.0
2022-10-04 19:56:33.883  INFO 964 --- [           main] o.m.exomiser.cli.config.MainConfig       : Data source directory defined in properties as: /oak/stanford/groups/caseyg21/Nirmal/Exomiser/exomiser-cli-13.0.0/data/exomiser-data
2022-10-04 19:56:33.884  INFO 964 --- [           main] o.m.exomiser.cli.config.MainConfig       : Root data source directory set to: /oak/stanford/groups/caseyg21/Nirmal/Exomiser/exomiser-cli-13.0.0/data/exomiser-data
2022-10-04 19:56:33.886  INFO 964 --- [           main] o.m.e.c.g.j.JannovarDataProtoSerialiser  : Deserialising Jannovar data from /oak/stanford/groups/caseyg21/Nirmal/Exomiser/exomiser-cli-13.0.0/data/exomiser-data/2109_hg19/2109_hg19_transcripts_ucsc.ser
2022-10-04 19:56:34.999  INFO 964 --- [           main] o.m.e.c.g.j.JannovarDataProtoSerialiser  : Deserialisation took 1.112 sec.

I ran the command:

#!/bin/bash -l
#SBATCH --mem=80G
#SBATCH --time=10:00:00
#SBATCH --output=%x.o%j

date
hostname

module load java/17

java -jar exomiser-cli-13.0.0.jar --analysis exomiser-cli-13.0.0/examples/test-analysis-multisample.yml

The application file looks like:

#
# The Exomiser - A tool to annotate and prioritize genomic variants
#
# Copyright (c) 2016-2021 Queen Mary University of London.
# Copyright (c) 2012-2016 Charité Universitätsmedizin Berlin and Genome Research Ltd.
#
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU Affero General Public License as
# published by the Free Software Foundation, either version 3 of the
# License, or (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
# GNU Affero General Public License for more details.
#
# You should have received a copy of the GNU Affero General Public License
# along with this program.  If not, see <http://www.gnu.org/licenses/>.
#

## exomiser root data directory ##
# root path where data is to be downloaded and worked on it is assumed that all the files required by exomiser listed
# in this properties file will be found in the data directory, unless specifically overridden here.
exomiser.data-directory=/Exomiser/exomiser-cli-13.0.0/data/exomiser-data

## optional data sources ##
# The location of these files need to be specified for each assembly in the sections below
# REMM can be downloaded from https://zenodo.org/record/4768448
# REMM is required for the genome preset.
remm.version=0.3.1.post1
# CADD can be downloaded from http://cadd.gs.washington.edu/download
# CADD is an optional datasource
cadd.version=1.6

### hg19 assembly ###
exomiser.hg19.data-version=2109
# transcript source will default to ensembl. Can define as ucsc/ensembl/refseq
exomiser.hg19.transcript-source=ucsc
# location of CADD/REMM Tabix files - you will need these for analysis of non-coding variants.
# You will require the tsv.gz and tsv.gz.tbi (tabix) file pairs.
# Un-comment and add the full path to the relevant tsv.gz files if you want to enable these.
exomiser.hg19.cadd-snv-path=$/Exomiser/exomiser-cli-13.0.0/data/exomiser-data/cadd/hg19/whole_genome_SNVs.tsv.gz
exomiser.hg19.cadd-in-del-path=/Exomiser/exomiser-cli-13.0.0/data/exomiser-data/cadd/hg19/InDels.tsv.gz
#exomiser.hg19.remm-path=${exomiser.data-directory}/remm/ReMM.v${remm.version}.hg19.tsv.gz
# local frequencies are required to be normalised in the same manner as the input VCF and frequency values must be percentages.
#exomiser.hg19.local-frequency-path=${exomiser.data-directory}/local/local_frequency_test_hg19.tsv.gz
exomiser.hg19.variant-white-list-path=/Exomiser/exomiser-cli-13.0.0/data/exomiser-data/2109_hg19/hg19_clinvar_whitelist.tsv.gz

### hg38 assembly ###
# To enable analysis of samples called against the hg38 assembly copy the hg19 above and just replace the hg19 with hg38
#exomiser.hg38.data-version=2109
#exomiser.hg38.cadd-snv-path=${exomiser.data-directory}/cadd/whole_genome_SNVs.tsv.gz
#exomiser.hg38.cadd-in-del-path=${exomiser.data-directory}/cadd/InDels.tsv.gz
#exomiser.hg38.remm-path=${exomiser.data-directory}/remm/ReMM.v${remm.version}.hg38.tsv.gz
#exomiser.hg38.local-frequency-path=${exomiser.data-directory}/local/local_frequency_test_hg38.tsv.gz
#exomiser.hg38.variant-white-list-path=${exomiser.hg38.data-version}_hg38_clinvar_whitelist.tsv.gz

### phenotypes ###
exomiser.phenotype.data-version=2109
exomiser.phenotype.data-directory=/Exomiser/exomiser-cli-13.0.0/data/exomiser-data/2109_phenotype
# String random walk data file
#exomiser.phenotype.random-walk-file-name=rw_string_10.mv
#exomiser.phenotype.random-walk-index-file-name=rw_string_9_05_id2index.gz

### caching ###
# If you're running exomiser in batch mode there might be some performance benefit if you enable caching. The 'simple'
# option will continue to store data in memory *without* limit - this means for really long-running batch jobs and/or
# whole genomes you may run out of memory.
# If this is likely choose the caffeine option and uncomment spring.cache.caffeine.spec and adjust the cache size
# to your requirements
#none/simple/caffeine
spring.cache.type=caffeine
spring.cache.caffeine.spec=maximumSize=1000000

### logging ###
logging.file.name=logs/exomiser.log

The yml file looks like:

## Exomiser Analysis Template for multi-sample VCF files
# These are all the possible options for running exomiser. Use this as a template for
# your own set-up.
analysis:
    # hg19 or hg38 - ensure that the application has been configured to run the specified assembly otherwise it will halt.
    genomeAssembly: hg19
    vcf: /Exomiser/exomiser-cli-13.0.0/examples/Pfeiffer-quartet.vcf.gz
    ped: /Exomiser/exomiser-cli-13.0.0/examples/Pfeiffer-quartet.ped
    proband: ISDBM322017
    hpoIds: ['HP:0001156', 'HP:0001363', 'HP:0011304', 'HP:0010055']
    # These are the default settings, with values representing the maximum minor allele frequency in percent (%) permitted for an
    # allele to be considered as a causative candidate under that mode of inheritance.
    # If you just want to analyse a sample under a single inheritance mode, delete/comment-out the others. For AUTOSOMAL_RECESSIVE
    # or X_RECESSIVE ensure *both* relevant HOM_ALT and COMP_HET modes are present.
    # In cases where you do not want any cut-offs applied an empty map should be used e.g. inheritanceModes: {}
    inheritanceModes: {
            AUTOSOMAL_DOMINANT: 0.1,
            AUTOSOMAL_RECESSIVE_HOM_ALT: 0.1,
            AUTOSOMAL_RECESSIVE_COMP_HET: 2.0,
            X_DOMINANT: 0.1,
            X_RECESSIVE_HOM_ALT: 0.1,
            X_RECESSIVE_COMP_HET: 2.0,
            MITOCHONDRIAL: 0.2
    }
    #FULL or PASS_ONLY
    analysisMode: PASS_ONLY
    #Possible frequencySources:
    #Thousand Genomes project - http://www.1000genomes.org/ (THOUSAND_GENOMES)
    #TOPMed - https://www.nhlbi.nih.gov/science/precision-medicine-activities (TOPMED)
    #UK10K - http://www.uk10k.org/ (UK10K)
    #ESP project - http://evs.gs.washington.edu/EVS/ (ESP_)
    #   ESP_AFRICAN_AMERICAN, ESP_EUROPEAN_AMERICAN, ESP_ALL,
    #ExAC project http://exac.broadinstitute.org/about (EXAC_)
    #   EXAC_AFRICAN_INC_AFRICAN_AMERICAN, EXAC_AMERICAN,
    #   EXAC_SOUTH_ASIAN, EXAC_EAST_ASIAN,
    #   EXAC_FINNISH, EXAC_NON_FINNISH_EUROPEAN,
    #   EXAC_OTHER
    #gnomAD - http://gnomad.broadinstitute.org/ (GNOMAD_E, GNOMAD_G)
    frequencySources: [
        THOUSAND_GENOMES,
        TOPMED,
        UK10K,

        ESP_AFRICAN_AMERICAN, ESP_EUROPEAN_AMERICAN, ESP_ALL,

        EXAC_AFRICAN_INC_AFRICAN_AMERICAN, EXAC_AMERICAN,
        EXAC_SOUTH_ASIAN, EXAC_EAST_ASIAN,
        EXAC_FINNISH, EXAC_NON_FINNISH_EUROPEAN,
        EXAC_OTHER,

        GNOMAD_E_AFR,
        GNOMAD_E_AMR,
#        GNOMAD_E_ASJ,
        GNOMAD_E_EAS,
        GNOMAD_E_FIN,
        GNOMAD_E_NFE,
        GNOMAD_E_OTH,
        GNOMAD_E_SAS,

        GNOMAD_G_AFR,
        GNOMAD_G_AMR,
#        GNOMAD_G_ASJ,
        GNOMAD_G_EAS,
        GNOMAD_G_FIN,
        GNOMAD_G_NFE,
        GNOMAD_G_OTH,
        GNOMAD_G_SAS
        ]
    #Possible pathogenicitySources: POLYPHEN, MUTATION_TASTER, SIFT, CADD, REMM
    #REMM is trained on non-coding regulatory regions
    #*WARNING* if you enable CADD or REMM ensure that you have downloaded and installed the CADD/REMM tabix files
    #and updated their location in the application.properties. Exomiser will not run without this.
    pathogenicitySources: [CADD]
    #this is the standard exomiser order.
    #all steps are optional
    steps: [
        #intervalFilter: {interval: 'chr10:123256200-123256300'},
        # or for multiple intervals:
        #intervalFilter: {intervals: ['chr10:123256200-123256300', 'chr10:123256290-123256350']},
        # or using a BED file - NOTE this should be 0-based, Exomiser otherwise uses 1-based coordinates in line with VCF
        #intervalFilter: {bed: /full/path/to/bed_file.bed},
        #genePanelFilter: {geneSymbols: ['FGFR1','FGFR2']},
        #failedVariantFilter: {},
        #qualityFilter: {minQuality: 50.0},
        variantEffectFilter: {
            remove: [
                FIVE_PRIME_UTR_EXON_VARIANT,
                FIVE_PRIME_UTR_INTRON_VARIANT,
                THREE_PRIME_UTR_EXON_VARIANT,
                THREE_PRIME_UTR_INTRON_VARIANT,
                NON_CODING_TRANSCRIPT_EXON_VARIANT,
                UPSTREAM_GENE_VARIANT,
                INTERGENIC_VARIANT,
                REGULATORY_REGION_VARIANT,
                CODING_TRANSCRIPT_INTRON_VARIANT,
                NON_CODING_TRANSCRIPT_INTRON_VARIANT,
                DOWNSTREAM_GENE_VARIANT
            ]
        },
        #knownVariantFilter: {}, #removes variants represented in the database
        frequencyFilter: {maxFrequency: 2.0},
        pathogenicityFilter: {keepNonPathogenic: true},
        #inheritanceFilter and omimPrioritiser should always run AFTER all other filters have completed
        #they will analyse genes according to the specified modeOfInheritance above- UNDEFINED will not be analysed.
        inheritanceFilter: {},
        #omimPrioritiser isn't mandatory.
        omimPrioritiser: {},
        #priorityScoreFilter: {minPriorityScore: 0.4},
        #Other prioritisers: Only combine omimPrioritiser with one of these.
        #Don't include any if you only want to filter the variants.
        hiPhivePrioritiser: {},
        # or run hiPhive in benchmarking mode:
        #hiPhivePrioritiser: {runParams: 'mouse'},
        #phivePrioritiser: {}
        #phenixPrioritiser: {}
        #exomeWalkerPrioritiser: {seedGeneIds: [11111, 22222, 33333]}
    ]
outputOptions:
    outputContributingVariantsOnly: false
    #numGenes options: 0 = all or specify a limit e.g. 500 for the first 500 results
    numGenes: 50
    #outputPrefix options: specify the path/filename without an extension and this will be added
    # according to the outputFormats option. If unspecified this will default to the following:
    # {exomiserDir}/results/input-vcf-name-exomiser-results.html
    # alternatively, specify a fully qualifed path only. e.g. /users/jules/exomes/analysis
    outputPrefix: results/Pfeiffer-quartet-hiphive-exome-PASS_ONLY
    #out-format options: HTML, JSON, TSV_GENE, TSV_VARIANT, VCF (default: HTML)
    outputFormats: [HTML, JSON, TSV_GENE, TSV_VARIANT, VCF]

I appreciate any help on this.

BW, n.

julesjacobsen commented 1 year ago

@nvadgama87 did you manage to resolve this? There looks to be an errant $ in this line of the application.properties file:

exomiser.hg19.cadd-snv-path=$/Exomiser/exomiser-cli-13.0.0/data/exomiser-data/cadd/hg19/whole_genome_SNVs.tsv.gz
julesjacobsen commented 1 year ago

No response - closing.