exomiser / Exomiser

A Tool to Annotate and Prioritize Exome Variants
https://exomiser.readthedocs.io
GNU Affero General Public License v3.0
200 stars 55 forks source link

Missing annotations for Gene SETD5 #165

Closed visze closed 7 years ago

visze commented 7 years ago

The gene SETD5 is associated with Mental retardation, autosomal dominant 23. If I use intellectual disability/ HP:0001249 the phenotype score is 0 (PhenIX and hiPHIVE). The strange thing is that exomiser says correctly that this gene belongs to Mental retardation, autosomal dominant 23. If I use exactly this disease and no hpo-term the same thing happens: 0 phenotype score. It looks for me that SETD5 is not annotated with any HPO-Term. So why nnotation with disease but no HPO-Term? This might be a bug in the software (maybe if the database is generated).

Phenomizer can rank the gene. ID is also known by the HPO-brwoser.

Here is an example: VCF:

##fileformat=VCFv4.2
##FORMAT=<ID=AD,Number=R,Type=Integer,Description="Allelic depths for the ref and alt alleles in the order listed">
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Approximate read depth (reads with MQ=255 or with bad mates are filtered)">
##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=MIN_DP,Number=1,Type=Integer,Description="Minimum DP observed within the GVCF block">
##FORMAT=<ID=PGT,Number=1,Type=String,Description="Physical phasing haplotype information, describing how the alternate alleles are phased in relation to one another">
##FORMAT=<ID=PID,Number=1,Type=String,Description="Physical phasing ID information, where each unique ID within a given sample (but not across samples) connects records within a phasing group">
##FORMAT=<ID=PL,Number=G,Type=Integer,Description="Normalized, Phred-scaled likelihoods for genotypes as defined in the VCF specification">
##FORMAT=<ID=RGQ,Number=1,Type=Integer,Description="Unconditional reference genotype confidence, encoded as a phred quality -10*log10 p(genotype call is wrong)">
##FORMAT=<ID=SB,Number=4,Type=Integer,Description="Per-sample component statistics which comprise the Fisher's Exact Test to detect strand bias.">
##INFO=<ID=AC,Number=A,Type=Integer,Description="Allele count in genotypes, for each ALT allele, in the same order as listed">
##INFO=<ID=AF,Number=A,Type=Float,Description="Allele Frequency, for each ALT allele, in the same order as listed">
##INFO=<ID=AN,Number=1,Type=Integer,Description="Total number of alleles in called genotypes">
##INFO=<ID=BaseQRankSum,Number=1,Type=Float,Description="Z-score from Wilcoxon rank sum test of Alt Vs. Ref base qualities">
##INFO=<ID=ClippingRankSum,Number=1,Type=Float,Description="Z-score From Wilcoxon rank sum test of Alt vs. Ref number of hard clipped bases">
##INFO=<ID=DB,Number=0,Type=Flag,Description="dbSNP Membership">
##INFO=<ID=DP,Number=1,Type=Integer,Description="Approximate read depth; some reads may have been filtered">
##INFO=<ID=DS,Number=0,Type=Flag,Description="Were any of the samples downsampled?">
##INFO=<ID=END,Number=1,Type=Integer,Description="Stop position of the interval">
##INFO=<ID=ExcessHet,Number=1,Type=Float,Description="Phred-scaled p-value for exact test of excess heterozygosity">
##INFO=<ID=FS,Number=1,Type=Float,Description="Phred-scaled p-value using Fisher's exact test to detect strand bias">
##INFO=<ID=HaplotypeScore,Number=1,Type=Float,Description="Consistency of the site with at most two segregating haplotypes">
##INFO=<ID=InbreedingCoeff,Number=1,Type=Float,Description="Inbreeding coefficient as estimated from the genotype likelihoods per-sample when compared against the Hardy-Weinberg expectation">
##INFO=<ID=MLEAC,Number=A,Type=Integer,Description="Maximum likelihood expectation (MLE) for the allele counts (not necessarily the same as the AC), for each ALT allele, in the same order as listed">
##INFO=<ID=MLEAF,Number=A,Type=Float,Description="Maximum likelihood expectation (MLE) for the allele frequency (not necessarily the same as the AF), for each ALT allele, in the same order as listed">
##INFO=<ID=MQ,Number=1,Type=Float,Description="RMS Mapping Quality">
##INFO=<ID=MQRankSum,Number=1,Type=Float,Description="Z-score From Wilcoxon rank sum test of Alt vs. Ref read mapping qualities">
##INFO=<ID=NEGATIVE_TRAIN_SITE,Number=0,Type=Flag,Description="This variant was used to build the negative training set of bad variants">
##INFO=<ID=POSITIVE_TRAIN_SITE,Number=0,Type=Flag,Description="This variant was used to build the positive training set of good variants">
##INFO=<ID=QD,Number=1,Type=Float,Description="Variant Confidence/Quality by Depth">
##INFO=<ID=RAW_MQ,Number=1,Type=Float,Description="Raw data for RMS Mapping Quality">
##INFO=<ID=ReadPosRankSum,Number=1,Type=Float,Description="Z-score from Wilcoxon rank sum test of Alt vs. Ref read position bias">
##INFO=<ID=SOR,Number=1,Type=Float,Description="Symmetric Odds Ratio of 2x2 contingency table to detect strand bias">
##INFO=<ID=VQSLOD,Number=1,Type=Float,Description="Log odds of being a true variant versus being false under the trained gaussian mixture model">
##INFO=<ID=culprit,Number=1,Type=String,Description="The annotation which was the worst performing in the Gaussian mixture model, likely the reason why the variant was filtered out">
##INFO=<ID=EFFECT,Number=1,Type=String,Description="variant effect (UTR5,UTR3,intronic,splicing,missense,stoploss,stopgain,frameshift-insertion,frameshift-deletion,non-frameshift-deletion,non-frameshift-insertion,synonymous)">
##INFO=<ID=HGVS,Number=1,Type=String,Description="HGVS Nomenclature">
##contig=<ID=1,length=249250621,assembly=b37>
##contig=<ID=2,length=243199373,assembly=b37>
##contig=<ID=3,length=198022430,assembly=b37>
##contig=<ID=4,length=191154276,assembly=b37>
##contig=<ID=5,length=180915260,assembly=b37>
##contig=<ID=6,length=171115067,assembly=b37>
##contig=<ID=7,length=159138663,assembly=b37>
##contig=<ID=8,length=146364022,assembly=b37>
##contig=<ID=9,length=141213431,assembly=b37>
##contig=<ID=10,length=135534747,assembly=b37>
##contig=<ID=11,length=135006516,assembly=b37>
##contig=<ID=12,length=133851895,assembly=b37>
##contig=<ID=13,length=115169878,assembly=b37>
##contig=<ID=14,length=107349540,assembly=b37>
##contig=<ID=15,length=102531392,assembly=b37>
##contig=<ID=16,length=90354753,assembly=b37>
##contig=<ID=17,length=81195210,assembly=b37>
##contig=<ID=18,length=78077248,assembly=b37>
##contig=<ID=19,length=59128983,assembly=b37>
##contig=<ID=20,length=63025520,assembly=b37>
##contig=<ID=21,length=48129895,assembly=b37>
##contig=<ID=22,length=51304566,assembly=b37>
##contig=<ID=X,length=155270560,assembly=b37>
##contig=<ID=Y,length=59373566,assembly=b37>
##contig=<ID=MT,length=16569,assembly=b37>
#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  A   B   C   D
3   9483882 .   G   A   4267.90 PASS    EFFECT=missense;HGVS=SETD5:NM_001080517:exon10:c.1030G>A:p.G344S,SETD5:NM_001292043:exon12:c.736G>A:p.G246S,;AC=3;AF=0.375;AN=8;BaseQRankSum=1.52;ClippingRankSum=0.014;DP=365;ExcessHet=5.4407;FS=2.969;MLEAC=3;MLEAF=0.375;MQ=60.00;MQRankSum=-7.660e-01;QD=13.21;ReadPosRankSum=1.39;SOR=0.856;VQSLOD=3.28;culprit=ReadPosRankSum    GT:AD:DP:GQ:PL  0/1:65,37:102:99:983,0,1922 0/0:41,0:41:99:0,102,1530   0/1:28,37:65:99:943,0,624   0/1:64,92:156:99:2376,0,1426

PED-File:

Fam1    B   0   0   1   1
Fam1    C   0   0   2   2
Fam1    A   B   C   2   2
Fam1    D   B   C   2   2
visze commented 7 years ago

Update: I tried it with the 7.2.1 Genomiser. And here it gets a phenotype score => 1.0

What is the current status of a new exomiser web? Can I install a web interface local so that our clinicians can use exomiser?

julesjacobsen commented 7 years ago

You want to use the version 7.32.3+ - check the releases page for info:

https://github.com/exomiser/Exomiser/releases/tag/7.2.3

There was a bug (see #147) in the way we dealt with multi-sample VCF.

Using this analysis (note proband: B and modeOfInheritance: AUTOSOMAL_DOMINANT)

---
analysis:
    vcf: SETD5.vcf
    ped: SETD5.ped
    proband: B
    # AUTOSOMAL_DOMINANT, AUTOSOMAL_RECESSIVE, X_RECESSIVE or UNDEFINED
    modeOfInheritance: AUTOSOMAL_DOMINANT
    #FULL, SPARSE or PASS_ONLY
    analysisMode: PASS_ONLY 
    geneScoreMode: RAW_SCORE
    hpoIds: ['HP:0001249']
    frequencySources: [
        THOUSAND_GENOMES,
        ESP_AFRICAN_AMERICAN, ESP_EUROPEAN_AMERICAN, ESP_ALL,
        EXAC_AFRICAN_INC_AFRICAN_AMERICAN, EXAC_AMERICAN,
        EXAC_SOUTH_ASIAN, EXAC_EAST_ASIAN,
        EXAC_FINNISH, EXAC_NON_FINNISH_EUROPEAN,
        EXAC_OTHER
        ]
    pathogenicitySources: [POLYPHEN, MUTATION_TASTER, SIFT]
    #this is the standard exomiser order.
    steps: [ 
        variantEffectFilter: {remove: [UPSTREAM_GENE_VARIANT,
            INTERGENIC_VARIANT,
            REGULATORY_REGION_VARIANT,
            CODING_TRANSCRIPT_INTRON_VARIANT,
            NON_CODING_TRANSCRIPT_INTRON_VARIANT,
            SYNONYMOUS_VARIANT,
            DOWNSTREAM_GENE_VARIANT,
            SPLICE_REGION_VARIANT]},
        frequencyFilter: {maxFrequency: 1.0},
        pathogenicityFilter: {keepNonPathogenic: true},
        inheritanceFilter: {},
        omimPrioritiser: {},
        hiPhivePrioritiser: {runParams: 'human'}
    ]
outputOptions:
    outputPassVariantsOnly: false
    #numGenes options: 0 = all or specify a limit e.g. 500 for the first 500 results  
    numGenes: 0
    outputPrefix: results/SETD5-AUTOSOMAL_DOMINANT
    #out-format options: HTML, TSV-GENE, TSV-VARIANT, VCF (default: HTML)
    outputFormats: [TSV-GENE, TSV-VARIANT, VCF, HTML]

This was produced:

setd5

Which is what you'd expect.

julesjacobsen commented 7 years ago

I need to put a change into the current dev version of the web interface to accommodate this fix, but otherwise it should be usable. Bear in mind the web interface only runs the original exomiser algorithm though. This was by design so as to limit the hardware requirements of the server.

visze commented 7 years ago

Ok. But how you can do that in the exomiser web?

in the actual development there is no additional field

visze commented 7 years ago

sorry now it is there!

visze commented 7 years ago

(p.s.: we need to update the documentation)