Ensembl / VEP_plugins

Plugins for the Ensembl Variant Effect Predictor (VEP)
Apache License 2.0
136 stars 114 forks source link

Mastermind #231

Closed shigej38 closed 4 years ago

shigej38 commented 4 years ago

Hello: When I run the Mastermind extension, it adds the comments to the output file in this way and does not add any mastermind information, but fills its columns with "-". What is this error? How can I solve.

INFO SPACE `

MMCNT1 : ?

MMCNT2 : ?

MMCNT3 : ?

MMID3 : ?

`

COLUMN SPACE EXAMPLE `

MMCNT1 MMCNT2 MMCNT3 MMID3
- - - -
- - - -
- - - -
- - - -
- - - -
- - - -
- - - -
- - - -
- - - -
- - - -
- - - -
- - - -
- - - -
- - - -
- - - -
- - - -
- - - -
- - - -
- - - -
- - - -
- - - -
- - - -
- - - -
- - - -
- - - -

`

at7 commented 4 years ago

Hi @shigej38, the output you see only means that there are no counts for the variant. If you are expecting a result could you please send me the input variants and also the command for how you call the Mastermind plugin together with the Mastermind filename and assembly version? Thank you, Anja

shigej38 commented 4 years ago

Hi @at7 The size of my input file is quite large. There are variants belonging to about 20,000 genes in the absence of any bi-mastermind matching. I am using command for calling is: --plugin Mastermind, / mnt / 7C033C763BE8E7D8 / ensembl-data / mastermind / mastermind_cited_variants_reference-2019.06.14-grch37.vcf.gz

my Mastermind file version is 2019.06.14-grch37

and I constantly update my Mastermind.pm plugin file.

at7 commented 4 years ago

Thank you for sending the additional information. I tried running Mastermind for an example variant (rs699, location 1:230845794-230845794) and get results back from the plugin using the same Mastermind file as you do. Could you please use the same example variant as I did and see if you get any results? I also recommend that you check out the latest Mastermind code. We did push some bug fixes in the last weeks.

shigej38 commented 4 years ago

In my file, although the appropriate variant in the header section of the description line belonging to the mastermine should not be empty or (?)?

I've updated all of the ensembl-vep and mastermind files, but this update didn't solve my problem. Can you send a sample vcf file if you don't know how to use the instance variable you mentioned? my e-mail: ertan.yilmaz@detagen.com.tr

at7 commented 4 years ago

I used the following variant for tests:

CHROM POS ID REF ALT QUAL FILTER INFO

1 230845794 rs699 A G . .

shigej38 commented 4 years ago

vermiş olduğunuz variantı şu şekilde kaydettim. #CHROM POS ID REF ALT QUAL FILTER INFO 1 230845794 rs699 A G . . bu girdiye göre şu şekilde çıktı oluşturdu. `## ENSEMBL VARIANT EFFECT PREDICTOR v97.3

Output produced at 2019-08-21 16:00:05

Using cache in /home/detagen/.vep/homo_sapiens_refseq/97_GRCh37

Using API version 97, DB version ?

ensembl version 97.378db18

ensembl-io version 97.dc917e1

ensembl-variation version 97.26a059c

ensembl-funcgen version 97.24f4d3c

1000genomes version phase3

gencode version GENCODE 19

polyphen version 2.2.2

regbuild version 1.0

assembly version GRCh37.p13

COSMIC version 86

refseq version 01_2015

dbSNP version 151

HGMD-PUBLIC version 20174

genebuild version 2011-04

sift version sift5.2.2

ESP version 20141103

ClinVar version 201810

gnomAD version r2.1

Column descriptions:

Uploaded_variation : Identifier of uploaded variant

cDNA_position : Relative position of base pair in cDNA sequence

REF_ALLELE : Reference allele

Allele : The variant allele used to calculate the consequence

Protein_position : Relative position of amino acid in protein

Amino_acids : Reference and variant amino acids

Codons : Reference and variant codon sequence

SYMBOL : Gene symbol (e.g. HGNC)

EXON : Exon number(s) / total

INTRON : Intron number(s) / total

ZYG : Zygosity of individual genotype at this locus

Consequence : Consequence type

CLIN_SIG : ClinVar clinical significance of the dbSNP variant

Existing_variation : Identifier(s) of co-located known variants

rs_dbSNP150 : (from dbNSFP) rs number from dbSNP 150

G2P_complete : Indicates this variant completes the allelic requirements for a G2P gene

G2P_flag : Flags zygosity of valid variants for a G2P gene

G2P_gene_req : MONO or BI depending on the context in which this gene has been explored

MMCNT1 : ?

MMCNT2 : ?

MMCNT3 : ?

MMID3 : ?

clinvar_rs : (from dbNSFP) rs number from the clinvar data set

clinvar_trait : (from dbNSFP) the trait/disease the clinvar_clnsig referring to

DOMAINS : The source and identifer of any overlapping protein domains

Interpro_domain : (from dbNSFP) domain or conserved site on which the variant locates. Domain annotations come from Interpro database. The number in the brackets following a specific domain is the count of times Interpro assigns the variant position to that domain, typically coming from different predicting databases. Multiple entries separated by ";".

hg18_chr : (from dbNSFP) chromosome as to hg18, "." means missing

hg18_pos(1-based) : (from dbNSFP) physical position on the chromosome as to hg18 (1-based coordinate) For mitochondrial SNV, this position refers to a YRI sequence (GenBank: AF347015)

hg19_chr : (from dbNSFP) chromosome as to hg19, "." means missing

hg19_pos(1-based) : (from dbNSFP) physical position on the chromosome as to hg19 (1-based coordinate). For mitochondrial SNV, this position refers to a YRI sequence (GenBank: AF347015)

ExAC_AC : (from dbNSFP) Allele count in total ExAC samples (60,706 samples)

ExAC_AF : Frequency of existing variant in ExAC combined population

gnomAD_exomes_AC : (from dbNSFP) Alternative allele count in the whole gnomAD exome samples (123,136 samples)

gnomAD_exomes_AF : (from dbNSFP) Alternative allele frequency in the whole gnomAD exome samples (123,136 samples)

gnomAD_genomes_AC : (from dbNSFP) Alternative allele count in the whole gnomAD genome samples (15,496 samples)

gnomAD_genomes_AF : (from dbNSFP) Alternative allele frequency in the whole gnomAD genome samples (15,496 samples)

Ensembl_geneid : (from dbNSFP) Ensembl gene id

Ensembl_proteinid : (from dbNSFP) Ensembl protein ids Multiple entries separated by ";", corresponding to Ensembl_transcriptids

Ensembl_transcriptid : (from dbNSFP) Ensembl transcript ids (Multiple entries separated by ";")

HGVSc : HGVS coding sequence name

HGVSp : HGVS protein sequence name

Location : Location of variant in standard coordinate format (chr:start or chr:start-end)

Gene : Stable ID of affected gene

Feature : Stable ID of feature

Feature_type : Type of feature - Transcript, RegulatoryFeature or MotifFeature

CDS_position : Relative position of base pair in coding sequence

IND : Individual name

ALLELE_NUM : Allele number from input; 0 is reference, 1 is first alternate etc

IMPACT : Subjective impact classification of consequence type

DISTANCE : Shortest distance from variant to transcript

STRAND : Strand of the feature (1/-1)

FLAGS : Transcript quality flags

VARIANT_CLASS : SO variant class

SYMBOL_SOURCE : Source of gene symbol

HGNC_ID : Stable identifer of HGNC gene symbol

BIOTYPE : Biotype of transcript or regulatory feature

CANONICAL : Indicates if transcript is canonical for this gene

MANE : MANE (Matched Annotation by NCBI and EMBL-EBI) Transcript

CCDS : Indicates if transcript is a CCDS transcript

ENSP : Protein identifer

SWISSPROT : UniProtKB/Swiss-Prot accession

TREMBL : UniProtKB/TrEMBL accession

UNIPARC : UniParc accession

REFSEQ_MATCH : RefSeq transcript match status

SOURCE : Source of transcript

SIFT : SIFT prediction and/or score

PolyPhen : PolyPhen prediction and/or score

HGVS_OFFSET : Indicates by how many bases the HGVS notations for this variant have been shifted

AF : Frequency of existing variant in 1000 Genomes combined population

AFR_AF : Frequency of existing variant in 1000 Genomes combined African population

AMR_AF : Frequency of existing variant in 1000 Genomes combined American population

EAS_AF : Frequency of existing variant in 1000 Genomes combined East Asian population

EUR_AF : Frequency of existing variant in 1000 Genomes combined European population

SAS_AF : Frequency of existing variant in 1000 Genomes combined South Asian population

AA_AF : Frequency of existing variant in NHLBI-ESP African American population

EA_AF : Frequency of existing variant in NHLBI-ESP European American population

gnomAD_AF : Frequency of existing variant in gnomAD exomes combined population

gnomAD_AFR_AF : Frequency of existing variant in gnomAD exomes African/American population

gnomAD_AMR_AF : Frequency of existing variant in gnomAD exomes American population

gnomAD_ASJ_AF : Frequency of existing variant in gnomAD exomes Ashkenazi Jewish population

gnomAD_EAS_AF : Frequency of existing variant in gnomAD exomes East Asian population

gnomAD_FIN_AF : Frequency of existing variant in gnomAD exomes Finnish population

gnomAD_NFE_AF : Frequency of existing variant in gnomAD exomes Non-Finnish European population

gnomAD_OTH_AF : Frequency of existing variant in gnomAD exomes other combined populations

gnomAD_SAS_AF : Frequency of existing variant in gnomAD exomes South Asian population

SOMATIC : Somatic status of existing variant

PHENO : Indicates if existing variant(s) is associated with a phenotype, disease or trait; multiple values correspond to multiple variants

OverlapBP : Number of base pairs overlapping with the corresponding structural variation feature

OverlapPC : Percentage of corresponding structural variation feature overlapped by the given input

1000Gp3_AC : (from dbNSFP) Alternative allele counts in the whole 1000 genomes phase 3 (1000Gp3) data.

1000Gp3_AF : (from dbNSFP) Alternative allele frequency in the whole 1000Gp3 data.

1000Gp3_AFR_AC : (from dbNSFP) Alternative allele counts in the 1000Gp3 African descendent samples.

1000Gp3_AFR_AF : (from dbNSFP) Alternative allele frequency in the 1000Gp3 African descendent samples.

1000Gp3_AMR_AC : (from dbNSFP) Alternative allele counts in the 1000Gp3 American descendent samples.

1000Gp3_AMR_AF : (from dbNSFP) Alternative allele frequency in the 1000Gp3 American descendent samples.

1000Gp3_EAS_AC : (from dbNSFP) Alternative allele counts in the 1000Gp3 East Asian descendent samples.

1000Gp3_EAS_AF : (from dbNSFP) Alternative allele frequency in the 1000Gp3 East Asian descendent samples.

1000Gp3_EUR_AC : (from dbNSFP) Alternative allele counts in the 1000Gp3 European descendent samples.

1000Gp3_EUR_AF : (from dbNSFP) Alternative allele frequency in the 1000Gp3 European descendent samples.

1000Gp3_SAS_AC : (from dbNSFP) Alternative allele counts in the 1000Gp3 South Asian descendent samples.

1000Gp3_SAS_AF : (from dbNSFP) Alternative allele frequency in the 1000Gp3 South Asian descendent samples.

ALSPAC_AC : (from dbNSFP) Alternative allele count in called genotypes in UK10K ALSPAC cohort.

ALSPAC_AF : (from dbNSFP) Alternative allele frequency in called genotypes in UK10K ALSPAC cohort.

AltaiNeandertal : (from dbNSFP) genotype of a deep sequenced Altai Neanderthal

Ancestral_allele : (from dbNSFP) ancestral allele based on 8 primates EPO. Ancestral alleles by Ensembl 84. The following comes from its original README file: ACTG - high-confidence call, ancestral state supported by the other two sequences actg - low-confidence call, ancestral state supported by one sequence only N - failure, the ancestral state is not supported by any other sequence - - the extant species contains an insertion at this position . - no coverage in the alignment

CADD_phred : (from dbNSFP) CADD phred-like score. This is phred-like rank score based on whole genome CADD raw scores. Please refer to Kircher et al. (2014) Nature Genetics 46(3):310-5 for details. The larger the score the more likely the SNP has damaging effect. Please note the following copyright statement for CADD: "CADD scores (http://cadd.gs.washington.edu/) are Copyright 2013 University of Washington and Hudson-Alpha Institute for Biotechnology (all rights reserved) but are freely available for all academic, non-commercial applications. For commercial licensing information contact Jennifer McCullar (mccullaj@uw.edu)."

CADD_raw : (from dbNSFP) CADD raw score for functional prediction of a SNP. Please refer to Kircher et al. (2014) Nature Genetics 46(3):310-5 for details. The larger the score the more likely the SNP has damaging effect. Scores range from -7.535037 to 35.788538 in dbNSFP. Please note the following copyright statement for CADD: "CADD scores (http://cadd.gs.washington.edu/) are Copyright 2013 University of Washington and Hudson-Alpha Institute for Biotechnology (all rights reserved) but are freely available for all academic, non-commercial applications. For commercial licensing information contact Jennifer McCullar (mccullaj@uw.edu)."

CADD_raw_rankscore : (from dbNSFP) CADD raw scores were ranked among all CADD raw scores in dbNSFP. The rankscore is the ratio of the rank of the score over the total number of CADD raw scores in dbNSFP. Please note the following copyright statement for CADD: "CADD scores (http://cadd.gs.washington.edu/) are Copyright 2013 University of Washington and Hudson-Alpha Institute for Biotechnology (all rights reserved) but are freely available for all academic, non-commercial applications. For commercial licensing information contact Jennifer McCullar (mccullaj@uw.edu)."

DANN_rankscore : (from dbNSFP) DANN scores were ranked among all DANN scores in dbNSFP. The rankscore is the ratio of the rank of the score over the total number of DANN scores in dbNSFP.

DANN_score : (from dbNSFP) DANN is a functional prediction score retrained based on the training data of CADD using deep neural network. Scores range from 0 to 1. A larger number indicate a higher probability to be damaging. More information of this score can be found in doi: 10.1093/bioinformatics/btu703. For commercial application of DANN, please contact Daniel Quang (dxquang@uci.edu)

Denisova : (from dbNSFP) genotype of a deep sequenced Denisova

ESP6500_AA_AC : (from dbNSFP) Alternative allele count in the African American samples of the NHLBI GO Exome Sequencing Project (ESP6500 data set).

ESP6500_AA_AF : (from dbNSFP) Alternative allele frequency in the African American samples of the NHLBI GO Exome Sequencing Project (ESP6500 data set).

ESP6500_EA_AC : (from dbNSFP) Alternative allele count in the European American samples of the NHLBI GO Exome Sequencing Project (ESP6500 data set).

ESP6500_EA_AF : (from dbNSFP) Alternative allele frequency in the European American samples of the NHLBI GO Exome Sequencing Project (ESP6500 data set).

Eigen-PC-phred : (from dbNSFP) Eigen PC score in phred scale.

Eigen-PC-raw : (from dbNSFP) Eigen PC score for genome-wide SNVs. A functional prediction score based on conservation, allele frequencies, deleteriousness prediction (for missense SNVs) and epigenomic signals (for synonymous and non-coding SNVs) using an unsupervised learning method (doi: 10.1038/ng.3477).

Eigen-PC-raw_rankscore : (from dbNSFP) Eigen-PC-raw scores were ranked among all Eigen-PC-raw scores in dbNSFP. The rankscore is the ratio of the rank of the score over the total number of Eigen-PC-raw scores in dbNSFP.

Eigen-phred : (from dbNSFP) Eigen score in phred scale.

Eigen-raw : (from dbNSFP) Eigen score for coding SNVs. A functional prediction score based on conservation, allele frequencies, and deleteriousness prediction using an unsupervised learning method (doi: 10.1038/ng.3477).

Eigen_coding_or_noncoding : (from dbNSFP) Whether Eigen-raw and Eigen-phred scores are based on coding model or noncoding model.

ExAC_AFR_AC : (from dbNSFP) Adjusted Alt allele counts (DP >= 10 & GQ >= 20) in African & African American ExAC samples

ExAC_AFR_AF : Frequency of existing variant in ExAC African/American population

ExAC_AMR_AC : (from dbNSFP) Adjusted Alt allele counts (DP >= 10 & GQ >= 20) in American ExAC samples

ExAC_AMR_AF : Frequency of existing variant in ExAC American population

ExAC_Adj_AC : (from dbNSFP) Adjusted Alt allele counts (DP >= 10 & GQ >= 20) in total ExAC samples

ExAC_Adj_AF : Adjusted frequency of existing variant in ExAC combined population

ExAC_EAS_AC : (from dbNSFP) Adjusted Alt allele counts (DP >= 10 & GQ >= 20) in East Asian ExAC samples

ExAC_EAS_AF : Frequency of existing variant in ExAC East Asian population

ExAC_FIN_AC : (from dbNSFP) Adjusted Alt allele counts (DP >= 10 & GQ >= 20) in Finnish ExAC samples

ExAC_FIN_AF : Frequency of existing variant in ExAC Finnish population

ExAC_NFE_AC : (from dbNSFP) Adjusted Alt allele counts (DP >= 10 & GQ >= 20) in Non-Finnish European ExAC samples

ExAC_NFE_AF : Frequency of existing variant in ExAC Non-Finnish European population

ExAC_SAS_AC : (from dbNSFP) Adjusted Alt allele counts (DP >= 10 & GQ >= 20) in South Asian ExAC samples

ExAC_SAS_AF : Frequency of existing variant in ExAC South Asian population

ExAC_nonTCGA_AC : (from dbNSFP) Allele count in total ExAC_nonTCGA samples (53,105 samples)

ExAC_nonTCGA_AF : (from dbNSFP) Allele frequency in total ExAC_nonTCGA samples

ExAC_nonTCGA_AFR_AC : (from dbNSFP) Adjusted Alt allele counts (DP >= 10 & GQ >= 20) in African & African American ExAC_nonTCGA samples

ExAC_nonTCGA_AFR_AF : (from dbNSFP) Adjusted Alt allele frequency (DP >= 10 & GQ >= 20) in African & African American ExAC_nonTCGA samples

ExAC_nonTCGA_AMR_AC : (from dbNSFP) Adjusted Alt allele counts (DP >= 10 & GQ >= 20) in American ExAC_nonTCGA samples

ExAC_nonTCGA_AMR_AF : (from dbNSFP) Adjusted Alt allele frequency (DP >= 10 & GQ >= 20) in American ExAC_nonTCGA samples

ExAC_nonTCGA_Adj_AC : (from dbNSFP) Adjusted Alt allele counts (DP >= 10 & GQ >= 20) in total ExAC_nonTCGA samples

ExAC_nonTCGA_Adj_AF : (from dbNSFP) Adjusted Alt allele frequency (DP >= 10 & GQ >= 20) in total ExAC_nonTCGA samples

ExAC_nonTCGA_EAS_AC : (from dbNSFP) Adjusted Alt allele counts (DP >= 10 & GQ >= 20) in East Asian ExAC_nonTCGA samples

ExAC_nonTCGA_EAS_AF : (from dbNSFP) Adjusted Alt allele frequency (DP >= 10 & GQ >= 20) in East Asian ExAC_nonTCGA samples

ExAC_nonTCGA_FIN_AC : (from dbNSFP) Adjusted Alt allele counts (DP >= 10 & GQ >= 20) in Finnish ExAC_nonTCGA samples

ExAC_nonTCGA_FIN_AF : (from dbNSFP) Adjusted Alt allele frequency (DP >= 10 & GQ >= 20) in Finnish ExAC_nonTCGA samples

ExAC_nonTCGA_NFE_AC : (from dbNSFP) Adjusted Alt allele counts (DP >= 10 & GQ >= 20) in Non-Finnish European ExAC_nonTCGA samples

ExAC_nonTCGA_NFE_AF : (from dbNSFP) Adjusted Alt allele frequency (DP >= 10 & GQ >= 20) in Non-Finnish European ExAC_nonTCGA samples

ExAC_nonTCGA_SAS_AC : (from dbNSFP) Adjusted Alt allele counts (DP >= 10 & GQ >= 20) in South Asian ExAC_nonTCGA samples

ExAC_nonTCGA_SAS_AF : (from dbNSFP) Adjusted Alt allele frequency (DP >= 10 & GQ >= 20) in South Asian ExAC_nonTCGA samples

ExAC_nonpsych_AC : (from dbNSFP) Allele count in total ExAC_nonpsych samples (45,376 samples)

ExAC_nonpsych_AF : (from dbNSFP) Allele frequency in total ExAC_nonpsych samples

ExAC_nonpsych_AFR_AC : (from dbNSFP) Adjusted Alt allele counts (DP >= 10 & GQ >= 20) in African & African American ExAC_nonpsych samples

ExAC_nonpsych_AFR_AF : (from dbNSFP) Adjusted Alt allele frequency (DP >= 10 & GQ >= 20) in African & African American ExAC_nonpsych samples

ExAC_nonpsych_AMR_AC : (from dbNSFP) Adjusted Alt allele counts (DP >= 10 & GQ >= 20) in American ExAC_nonpsych samples

ExAC_nonpsych_AMR_AF : (from dbNSFP) Adjusted Alt allele frequency (DP >= 10 & GQ >= 20) in American ExAC_nonpsych samples

ExAC_nonpsych_Adj_AC : (from dbNSFP) Adjusted Alt allele counts (DP >= 10 & GQ >= 20) in total ExAC_nonpsych samples

ExAC_nonpsych_Adj_AF : (from dbNSFP) Adjusted Alt allele frequency (DP >= 10 & GQ >= 20) in total ExAC_nonpsych samples

ExAC_nonpsych_EAS_AC : (from dbNSFP) Adjusted Alt allele counts (DP >= 10 & GQ >= 20) in East Asian ExAC_nonpsych samples

ExAC_nonpsych_EAS_AF : (from dbNSFP) Adjusted Alt allele frequency (DP >= 10 & GQ >= 20) in East Asian ExAC_nonpsych samples

ExAC_nonpsych_FIN_AC : (from dbNSFP) Adjusted Alt allele counts (DP >= 10 & GQ >= 20) in Finnish ExAC_nonpsych samples

ExAC_nonpsych_FIN_AF : (from dbNSFP) Adjusted Alt allele frequency (DP >= 10 & GQ >= 20) in Finnish ExAC_nonpsych samples

ExAC_nonpsych_NFE_AC : (from dbNSFP) Adjusted Alt allele counts (DP >= 10 & GQ >= 20) in Non-Finnish European ExAC_nonpsych samples

ExAC_nonpsych_NFE_AF : (from dbNSFP) Adjusted Alt allele frequency (DP >= 10 & GQ >= 20) in Non-Finnish European ExAC_nonpsych samples

ExAC_nonpsych_SAS_AC : (from dbNSFP) Adjusted Alt allele counts (DP >= 10 & GQ >= 20) in South Asian ExAC_nonpsych samples

ExAC_nonpsych_SAS_AF : (from dbNSFP) Adjusted Alt allele frequency (DP >= 10 & GQ >= 20) in South Asian ExAC_nonpsych samples

FATHMM_converted_rankscore : (from dbNSFP) FATHMMori scores were first converted to FATHMMnew=1-(FATHMMori+16.13)/26.77, then ranked among all FATHMMnew scores in dbNSFP. The rankscore is the ratio of the rank of the score over the total number of FATHMMnew scores in dbNSFP. If there are multiple scores, only the most damaging (largest) rankscore is presented. The scores range from 0 to 1.

FATHMM_pred : (from dbNSFP) If a FATHMMori score is <=-1.5 (or rankscore >=0.81332) the corresponding nsSNV is predicted as "D(AMAGING)"; otherwise it is predicted as "T(OLERATED)". Multiple predictions separated by ";", corresponding to Ensembl_proteinid.

FATHMM_score : (from dbNSFP) FATHMM default score (weighted for human inherited-disease mutations with Disease Ontology) (FATHMMori). Scores range from -16.13 to 10.64. The smaller the score the more likely the SNP has damaging effect. Multiple scores separated by ";", corresponding to Ensembl_proteinid.

GERP++_NR : (from dbNSFP) GERP++ neutral rate

GERP++_RS : (from dbNSFP) GERP++ RS score, the larger the score, the more conserved the site. Scores range from -12.3 to 6.17.

GERP++_RS_rankscore : (from dbNSFP) GERP++ RS scores were ranked among all GERP++ RS scores in dbNSFP. The rankscore is the ratio of the rank of the score over the total number of GERP++ RS scores in dbNSFP.

GM12878_confidence_value : (from dbNSFP) 0 - highly significant scores (approx. p<.003); 1 - significant scores (approx. p<.05); 2 - informative scores (approx. p<.25); 3 - other scores (approx. p>=.25).

GM12878_fitCons_score : (from dbNSFP) fitCons score predicts the fraction of genomic positions belonging to a specific function class (defined by epigenomic "fingerprint") that are under selective pressure. Scores range from 0 to 1, with a larger score indicating a higher proportion of nucleic sites of the functional class the genomic position belong to are under selective pressure, therefore more likely to be functional important. GM12878 fitCons scores are based on cell type GM12878. More details can be found in doi:10.1038/ng.3196.

GM12878_fitCons_score_rankscore : GM12878_fitCons_score_rankscore from dbNSFP file

GTEx_V6p_gene : (from dbNSFP) target gene of the (significant) eQTL SNP

GTEx_V6p_tissue : (from dbNSFP) tissue type of the expression data with which the eQTL/gene pair is detected

GenoCanyon_score : (from dbNSFP) A functional prediction score based on conservation and biochemical annotations using an unsupervised statistical learning. (doi:10.1038/srep10576)

GenoCanyon_score_rankscore : (from dbNSFP) GenoCanyon_score scores were ranked among all integrated fitCons scores in dbNSFP. The rankscore is the ratio of the rank of the score over the total number of GenoCanyon_score scores in dbNSFP.

H1-hESC_confidence_value : (from dbNSFP) 0 - highly significant scores (approx. p<.003); 1 - significant scores (approx. p<.05); 2 - informative scores (approx. p<.25); 3 - other scores (approx. p>=.25).

H1-hESC_fitCons_score : (from dbNSFP) fitCons score predicts the fraction of genomic positions belonging to a specific function class (defined by epigenomic "fingerprint") that are under selective pressure. Scores range from 0 to 1, with a larger score indicating a higher proportion of nucleic sites of the functional class the genomic position belong to are under selective pressure, therefore more likely to be functional important. GM12878 fitCons scores are based on cell type H1-hESC. More details can be found in doi:10.1038/ng.3196.

H1-hESC_fitCons_score_rankscore : H1-hESC_fitCons_score_rankscore from dbNSFP file

clinvar_clnsig : (from dbNSFP) clinical significance as to the clinvar data set. 0 - unknown, 1 - untested, 2 - Benign, 3 - Likely benign, 4 - Likely pathogenic, 5 - Pathogenic, 6 - drug response, 7 - histocompatibility. A negative score means the the score is for the ref allele

clinvar_golden_stars : (from dbNSFP) ClinVar Review Status summary. 0 - no assertion criteria provided, 1 - criteria provided, single submitter, 2 - criteria provided, multiple submitters, no conflicts, 3 - reviewed by expert panel, 4 - practice guideline

HUVEC_confidence_value : (from dbNSFP) 0 - highly significant scores (approx. p<.003); 1 - significant scores (approx. p<.05); 2 - informative scores (approx. p<.25); 3 - other scores (approx. p>=.25).

HUVEC_fitCons_score : (from dbNSFP) fitCons score predicts the fraction of genomic positions belonging to a specific function class (defined by epigenomic "fingerprint") that are under selective pressure. Scores range from 0 to 1, with a larger score indicating a higher proportion of nucleic sites of the functional class the genomic position belong to are under selective pressure, therefore more likely to be functional important. GM12878 fitCons scores are based on cell type HUVEC. More details can be found in doi:10.1038/ng.3196.

HUVEC_fitCons_score_rankscore : HUVEC_fitCons_score_rankscore from dbNSFP file

LRT_Omega : (from dbNSFP) estimated nonsynonymous-to-synonymous-rate ratio (Omega, reported by LRT)

LRT_converted_rankscore : (from dbNSFP) LRTori scores were first converted as LRTnew=1-LRTori0.5 if Omega<1, or LRTnew=LRTori0.5 if Omega>=1. Then LRTnew scores were ranked among all LRTnew scores in dbNSFP. The rankscore is the ratio of the rank over the total number of the scores in dbNSFP. The scores range from 0.00162 to 0.84324.

LRT_pred : (from dbNSFP) LRT prediction, D(eleterious), N(eutral) or U(nknown), which is not solely determined by the score.

LRT_score : (from dbNSFP) The original LRT two-sided p-value (LRTori), ranges from 0 to 1.

M-CAP_pred : (from dbNSFP) Prediction of M-CAP score based on the authors' recommendation, "T(olerated)" or "D(amaging)". The score cutoff between "D" and "T" is 0.025.

M-CAP_rankscore : (from dbNSFP) M-CAP scores were ranked among all M-CAP scores in dbNSFP. The rankscore is the ratio of the rank of the score over the total number of M-CAP scores in dbNSFP.

M-CAP_score : (from dbNSFP) M-CAP score (details in DOI: 10.1038/ng.3703). Scores range from 0 to 1. The larger the score the more likely the SNP has damaging effect.

MetaLR_pred : (from dbNSFP) Prediction of our MetaLR based ensemble prediction score,"T(olerated)" or "D(amaging)". The score cutoff between "D" and "T" is 0.5. The rankscore cutoff between "D" and "T" is 0.81113.

MetaLR_rankscore : (from dbNSFP) MetaLR scores were ranked among all MetaLR scores in dbNSFP. The rankscore is the ratio of the rank of the score over the total number of MetaLR scores in dbNSFP. The scores range from 0 to 1.

MetaLR_score : (from dbNSFP) Our logistic regression (LR) based ensemble prediction score, which incorporated 10 scores (SIFT, PolyPhen-2 HDIV, PolyPhen-2 HVAR, GERP++, MutationTaster, Mutation Assessor, FATHMM, LRT, SiPhy, PhyloP) and the maximum frequency observed in the 1000 genomes populations. Larger value means the SNV is more likely to be damaging. Scores range from 0 to 1.

MetaSVM_pred : (from dbNSFP) Prediction of our SVM based ensemble prediction score,"T(olerated)" or "D(amaging)". The score cutoff between "D" and "T" is 0. The rankscore cutoff between "D" and "T" is 0.82268.

MetaSVM_rankscore : (from dbNSFP) MetaSVM scores were ranked among all MetaSVM scores in dbNSFP. The rankscore is the ratio of the rank of the score over the total number of MetaSVM scores in dbNSFP. The scores range from 0 to 1.

MetaSVM_score : (from dbNSFP) Our support vector machine (SVM) based ensemble prediction score, which incorporated 10 scores (SIFT, PolyPhen-2 HDIV, PolyPhen-2 HVAR, GERP++, MutationTaster, Mutation Assessor, FATHMM, LRT, SiPhy, PhyloP) and the maximum frequency observed in the 1000 genomes populations. Larger value means the SNV is more likely to be damaging. Scores range from -2 to 3 in dbNSFP.

MutPred_AAchange : (from dbNSFP) Amino acid change used for MutPred_score calculation.

MutPred_Top5features : (from dbNSFP) Top 5 features (molecular mechanisms of disease) as predicted by MutPred with p values. MutPred_score > 0.5 and p < 0.05 are referred to as actionable hypotheses. MutPred_score > 0.75 and p < 0.05 are referred to as confident hypotheses. MutPred_score > 0.75 and p < 0.01 are referred to as very confident hypotheses.

MutPred_protID : (from dbNSFP) UniProt accession or Ensembl transcript ID used for MutPred_score calculation.

MutPred_rankscore : (from dbNSFP) MutPred scores were ranked among all MutPred scores in dbNSFP. The rankscore is the ratio of the rank of the score over the total number of MutPred scores in dbNSFP.

MutPred_score : (from dbNSFP) General MutPred score. Scores range from 0 to 1. The larger the score the more likely the SNP has damaging effect.

MutationAssessor_UniprotID : (from dbNSFP) Uniprot ID number provided by MutationAssessor.

MutationAssessor_pred : (from dbNSFP) MutationAssessor's functional impact of a variant : predicted functional, i.e. high ("H") or medium ("M"), or predicted non-functional, i.e. low ("L") or neutral ("N"). The MAori score cutoffs between "H" and "M", "M" and "L", and "L" and "N", are 3.5, 1.935 and 0.8, respectively. The rankscore cutoffs between "H" and "M", "M" and "L", and "L" and "N", are 0.92922, 0.51944 and 0.19719, respectively.

MutationAssessor_score : (from dbNSFP) MutationAssessor functional impact combined score (MAori). The score ranges from -5.135 to 6.49 in dbNSFP.

MutationAssessor_score_rankscore : MutationAssessor_score_rankscore from dbNSFP file

MutationAssessor_variant : (from dbNSFP) AA variant as to MutationAssessor_UniprotID.

MutationTaster_AAE : (from dbNSFP) MutationTaster predicted amino acid change.

MutationTaster_converted_rankscore : (from dbNSFP) The MTori scores were first converted: if the prediction is "A" or "D" MTnew=MTori; if the prediction is "N" or "P", MTnew=1-MTori. Then MTnew scores were ranked among all MTnew scores in dbNSFP. If there are multiple scores of a SNV, only the largest MTnew was used in ranking. The rankscore is the ratio of the rank of the score over the total number of MTnew scores in dbNSFP. The scores range from 0.08979 to 0.81033.

MutationTaster_model : (from dbNSFP) MutationTaster prediction models.

MutationTaster_pred : (from dbNSFP) MutationTaster prediction, "A" ("disease_causing_automatic"), "D" ("disease_causing"), "N" ("polymorphism") or "P" ("polymorphism_automatic"). The score cutoff between "D" and "N" is 0.5 for MTnew and 0.31713 for the rankscore.

MutationTaster_score : (from dbNSFP) MutationTaster p-value (MTori), ranges from 0 to 1. Multiple scores are separated by ";". Information on corresponding transcript(s) can be found by querying http://www.mutationtaster.org/ChrPos.html

PROVEAN_converted_rankscore : (from dbNSFP) PROVEANori were first converted to PROVEANnew=1-(PROVEANori+14)/28, then ranked among all PROVEANnew scores in dbNSFP. The rankscore is the ratio of the rank the PROVEANnew score over the total number of PROVEANnew scores in dbNSFP. If there are multiple scores, only the most damaging (largest) rankscore is presented. The scores range from 0 to 1.

PROVEAN_pred : (from dbNSFP) If PROVEANori <= -2.5 (rankscore>=0.543) the corresponding nsSNV is predicted as "D(amaging)"; otherwise it is predicted as "N(eutral)". Multiple predictions separated by ";", corresponding to Ensembl_proteinid.

PROVEAN_score : (from dbNSFP) PROVEAN score (PROVEANori). Scores range from -14 to 14. The smaller the score the more likely the SNP has damaging effect. Multiple scores separated by ";", corresponding to Ensembl_proteinid.

Polyphen2_HDIV_pred : (from dbNSFP) Polyphen2 prediction based on HumDiv, "D" ("probably damaging", HDIV score in [0.957,1] or rankscore in [0.52844,0.89865]), "P" ("possibly damaging", HDIV score in [0.453,0.956] or rankscore in [0.34282,0.52689]) and "B" ("benign", HDIV score in [0,0.452] or rankscore in [0.02634,0.34268]). Score cutoff for binary classification is 0.5 for HDIV score or 0.3528 for rankscore, i.e. the prediction is "neutral" if the HDIV score is smaller than 0.5 (rankscore is smaller than 0.3528), and "deleterious" if the HDIV score is larger than 0.5 (rankscore is larger than 0.3528). Multiple entries are separated by ";".

Polyphen2_HDIV_rankscore : (from dbNSFP) Polyphen2 HDIV scores were first ranked among all HDIV scores in dbNSFP. The rankscore is the ratio of the rank the score over the total number of the scores in dbNSFP. If there are multiple scores, only the most damaging (largest) rankscore is presented. The scores range from 0.02634 to 0.89865.

Polyphen2_HDIV_score : (from dbNSFP) Polyphen2 score based on HumDiv, i.e. hdiv_prob. The score ranges from 0 to 1. Multiple entries separated by ";", corresponding to Uniprot_acc_Polyphen2.

Polyphen2_HVAR_pred : (from dbNSFP) Polyphen2 prediction based on HumVar, "D" ("probably damaging", HVAR score in [0.909,1] or rankscore in [0.62797,0.97092]), "P" ("possibly damaging", HVAR in [0.447,0.908] or rankscore in [0.44195,0.62727]) and "B" ("benign", HVAR score in [0,0.446] or rankscore in [0.01257,0.44151]). Score cutoff for binary classification is 0.5 for HVAR score or 0.45833 for rankscore, i.e. the prediction is "neutral" if the HVAR score is smaller than 0.5 (rankscore is smaller than 0.45833), and "deleterious" if the HVAR score is larger than 0.5 (rankscore is larger than 0.45833). Multiple entries are separated by ";".

Polyphen2_HVAR_rankscore : (from dbNSFP) Polyphen2 HVAR scores were first ranked among all HVAR scores in dbNSFP. The rankscore is the ratio of the rank the score over the total number of the scores in dbNSFP. If there are multiple scores, only the most damaging (largest) rankscore is presented. The scores range from 0.01257 to 0.97092.

Polyphen2_HVAR_score : (from dbNSFP) Polyphen2 score based on HumVar, i.e. hvar_prob. The score ranges from 0 to 1. Multiple entries separated by ";", corresponding to Uniprot_acc_Polyphen2.

REVEL_rankscore : (from dbNSFP) REVEL scores were ranked among all REVEL scores in dbNSFP. The rankscore is the ratio of the rank of the score over the total number of REVEL scores in dbNSFP.

REVEL_score : (from dbNSFP) REVEL is an ensemble score based on 13 individual scores for predicting the pathogenicity of missense variants. Scores range from 0 to 1. The larger the score the more likely the SNP has damaging effect. "REVEL scores are freely available for non-commercial use. For other uses, please contact Weiva Sieh" (weiva.sieh@mssm.edu)

Reliability_index : (from dbNSFP) Number of observed component scores (except the maximum frequency in the 1000 genomes populations) for MetaSVM and MetaLR. Ranges from 1 to 10. As MetaSVM and MetaLR scores are calculated based on imputed data, the less missing component scores, the higher the reliability of the scores and predictions.

SIFT_converted_rankscore : (from dbNSFP) SIFTori scores were first converted to SIFTnew=1-SIFTori, then ranked among all SIFTnew scores in dbNSFP. The rankscore is the ratio of the rank the SIFTnew score over the total number of SIFTnew scores in dbNSFP. If there are multiple scores, only the most damaging (largest) rankscore is presented. The rankscores range from 0.00963 to 0.91219.

SIFT_pred : (from dbNSFP) If SIFTori is smaller than 0.05 (rankscore>0.395) the corresponding nsSNV is predicted as "D(amaging)"; otherwise it is predicted as "T(olerated)". Multiple predictions separated by ";"

SIFT_score : (from dbNSFP) SIFT score (SIFTori). Scores range from 0 to 1. The smaller the score the more likely the SNP has damaging effect. Multiple scores separated by ";", corresponding to Ensembl_proteinid.

SiPhy_29way_logOdds : (from dbNSFP) SiPhy score based on 29 mammals genomes. The larger the score, the more conserved the site. Scores range from 0 to 37.9718 in dbNSFP.

SiPhy_29way_logOdds_rankscore : (from dbNSFP) SiPhy_29way_logOdds scores were ranked among all SiPhy_29way_logOdds scores in dbNSFP. The rankscore is the ratio of the rank of the score over the total number of SiPhy_29way_logOdds scores in dbNSFP.

SiPhy_29way_pi : (from dbNSFP) The estimated stationary distribution of A, C, G and T at the site, using SiPhy algorithm based on 29 mammals genomes.

TWINSUK_AC : (from dbNSFP) Alternative allele count in called genotypes in UK10K TWINSUK cohort.

TWINSUK_AF : (from dbNSFP) Alternative allele frequency in called genotypes in UK10K TWINSUK cohort.

Transcript_id_VEST3 : (from dbNSFP) Transcript id provided by VEST3.

Transcript_var_VEST3 : (from dbNSFP) amino acid change as to Transcript_id_VEST3.

Uniprot_aapos_Polyphen2 : (from dbNSFP) amino acid position as to Uniprot_acc_Polyphen2. Multiple entries separated by ";".

Uniprot_acc_Polyphen2 : (from dbNSFP) Uniprot accession number provided by Polyphen2. Multiple entries separated by ";".

Uniprot_id_Polyphen2 : (from dbNSFP) Uniprot ID numbers corresponding to Uniprot_acc_Polyphen2. Multiple entries separated by ";".

VEST3_rankscore : (from dbNSFP) VEST3 scores were ranked among all VEST3 scores in dbNSFP. The rankscore is the ratio of the rank of the score over the total number of VEST3 scores in dbNSFP. In case there are multiple scores for the same variant, the largest score (most damaging) is presented. The scores range from 0 to 1. Please note VEST score is free for non-commercial use. For more details please refer to http://wiki.chasmsoftware.org/index.php/SoftwareLicense. Commercial users should contact the Johns Hopkins Technology Transfer office.

VEST3_score : (from dbNSFP) VEST 3.0 score. Score ranges from 0 to 1. The larger the score the more likely the mutation may cause functional change. Multiple scores separated by ";", corresponding to Transcript_id_VEST3. Please note this score is free for non-commercial use. For more details please refer to http://wiki.chasmsoftware.org/index.php/SoftwareLicense. Commercial users should contact the Johns Hopkins Technology Transfer office.

aaalt : (from dbNSFP) alternative amino acid "." if the variant is a splicing site SNP (2bp on each end of an intron)

aapos : (from dbNSFP) amino acid position as to the protein. "-1" if the variant is a splicing site SNP (2bp on each end of an intron). Multiple entries separated by ";", corresponding to Ensembl_proteinid

aaref : (from dbNSFP) reference amino acid "." if the variant is a splicing site SNP (2bp on each end of an intron)

alt : (from dbNSFP) alternative nucleotide allele (as on the + strand)

cds_strand : (from dbNSFP) coding sequence (CDS) strand (+ or -)

chr : (from dbNSFP) chromosome number

codon_degeneracy : (from dbNSFP) degenerate type (0, 2 or 3)

codonpos : (from dbNSFP) position on the codon (1, 2 or 3)

fathmm-MKL_coding_group : (from dbNSFP) the groups of features (labeled A-J) used to obtained the score. More details can be found in doi: 10.1093/bioinformatics/btv009.

fathmm-MKL_coding_pred : (from dbNSFP) If a fathmm-MKL_coding_score is >0.5 (or rankscore >0.28317) the corresponding nsSNV is predicted as "D(AMAGING)"; otherwise it is predicted as "N(EUTRAL)".

fathmm-MKL_coding_rankscore : (from dbNSFP) fathmm-MKL coding scores were ranked among all fathmm-MKL coding scores in dbNSFP. The rankscore is the ratio of the rank of the score over the total number of fathmm-MKL coding scores in dbNSFP.

fathmm-MKL_coding_score : (from dbNSFP) fathmm-MKL p-values. Scores range from 0 to 1. SNVs with scores >0.5 are predicted to be deleterious, and those <0.5 are predicted to be neutral or benign. Scores close to 0 or 1 are with the highest-confidence. Coding scores are trained using 10 groups of features. More details of the score can be found in doi: 10.1093/bioinformatics/btv009.

genename : (from dbNSFP) gene name; if the nsSNV can be assigned to multiple genes, gene names are separated by ";"

gnomAD_exomes_AFR_AC : (from dbNSFP) Alternative allele count in the African/African American gnomAD exome samples (7,652 samples)

gnomAD_exomes_AFR_AF : (from dbNSFP) Alternative allele frequency in the African/African American gnomAD exome samples (7,652 samples)

gnomAD_exomes_AFR_AN : (from dbNSFP) Total allele count in the African/African American gnomAD exome samples (7,652 samples)

gnomAD_exomes_AMR_AC : (from dbNSFP) Alternative allele count in the Latino gnomAD exome samples (16,791 samples)

gnomAD_exomes_AMR_AF : (from dbNSFP) Alternative allele frequency in the Latino gnomAD exome samples (16,791 samples)

gnomAD_exomes_AMR_AN : (from dbNSFP) Total allele count in the Latino gnomAD exome samples (16,791 samples)

gnomAD_exomes_AN : (from dbNSFP) Total allele count in the whole gnomAD exome samples (123,136 samples)

gnomAD_exomes_ASJ_AC : (from dbNSFP) Alternative allele count in the Ashkenazi Jewish gnomAD exome samples (4,925 samples)

gnomAD_exomes_ASJ_AF : (from dbNSFP) Alternative allele frequency in the Ashkenazi Jewish gnomAD exome samples (4,925 samples)

gnomAD_exomes_ASJ_AN : (from dbNSFP) Total allele count in the Ashkenazi Jewish gnomAD exome samples (4,925 samples)

gnomAD_exomes_EAS_AC : (from dbNSFP) Alternative allele count in the East Asian gnomAD exome samples (8,624 samples)

gnomAD_exomes_EAS_AF : (from dbNSFP) Alternative allele frequency in the East Asian gnomAD exome samples (8,624 samples)

gnomAD_exomes_EAS_AN : (from dbNSFP) Total allele count in the East Asian gnomAD exome samples (8,624 samples)

gnomAD_exomes_FIN_AC : (from dbNSFP) Alternative allele count in the Finnish gnomAD exome samples (11,150 samples)

gnomAD_exomes_FIN_AF : (from dbNSFP) Alternative allele frequency in the Finnish gnomAD exome samples (11,150 samples)

gnomAD_exomes_FIN_AN : (from dbNSFP) Total allele count in the Finnish gnomAD exome samples (11,150 samples)

gnomAD_exomes_NFE_AC : (from dbNSFP) Alternative allele count in the Non-Finnish European gnomAD exome samples (55,860 samples)

gnomAD_exomes_NFE_AF : (from dbNSFP) Alternative allele frequency in the Non-Finnish European gnomAD exome samples (55,860 samples)

gnomAD_exomes_NFE_AN : (from dbNSFP) Total allele count in the Non-Finnish European gnomAD exome samples (55,860 samples)

gnomAD_exomes_OTH_AC : (from dbNSFP) Alternative allele count in other gnomAD exome samples (2,743 samples)

gnomAD_exomes_OTH_AF : (from dbNSFP) Alternative allele frequency in other gnomAD exome samples (2,743 samples)

gnomAD_exomes_OTH_AN : (from dbNSFP) Total allele count in other gnomAD exome samples (2,743 samples)

gnomAD_exomes_SAS_AC : (from dbNSFP) Alternative allele count in the South Asian gnomAD exome samples (15,391 samples)

gnomAD_exomes_SAS_AF : (from dbNSFP) Alternative allele frequency in the South Asian gnomAD exome samples (15,391 samples)

gnomAD_exomes_SAS_AN : (from dbNSFP) Total allele count in the South Asian gnomAD exome samples (15,391 samples)

gnomAD_genomes_AFR_AC : (from dbNSFP) Alternative allele count in the African/African American gnomAD genome samples (4,368 samples)

gnomAD_genomes_AFR_AF : (from dbNSFP) Alternative allele frequency in the African/African American gnomAD genome samples (4,368 samples)

gnomAD_genomes_AFR_AN : (from dbNSFP) Total allele count in the African/African American gnomAD genome samples (4,368 samples)

gnomAD_genomes_AMR_AC : (from dbNSFP) Alternative allele count in the Latino gnomAD genome samples (419 samples)

gnomAD_genomes_AMR_AF : (from dbNSFP) Alternative allele frequency in the Latino gnomAD genome samples (419 samples)

gnomAD_genomes_AMR_AN : (from dbNSFP) Total allele count in the Latino gnomAD genome samples (419 samples)

gnomAD_genomes_AN : (from dbNSFP) Total allele count in the whole gnomAD genome samples (15,496 samples)

gnomAD_genomes_ASJ_AC : (from dbNSFP) Alternative allele count in the Ashkenazi Jewish gnomAD genome samples (151 samples)

gnomAD_genomes_ASJ_AF : (from dbNSFP) Alternative allele frequency in the Ashkenazi Jewish gnomAD genome samples (151 samples)

gnomAD_genomes_ASJ_AN : (from dbNSFP) Total allele count in the Ashkenazi Jewish gnomAD genome samples (151 samples)

gnomAD_genomes_EAS_AC : (from dbNSFP) Alternative allele count in the East Asian gnomAD genome samples (811 samples)

gnomAD_genomes_EAS_AF : (from dbNSFP) Alternative allele frequency in the East Asian gnomAD genome samples (811 samples)

gnomAD_genomes_EAS_AN : (from dbNSFP) Total allele count in the East Asian gnomAD genome samples (811 samples)

gnomAD_genomes_FIN_AC : (from dbNSFP) Alternative allele count in the Finnish gnomAD genome samples (1,747 samples)

gnomAD_genomes_FIN_AF : (from dbNSFP) Alternative allele frequency in the Finnish gnomAD genome samples (1,747 samples)

gnomAD_genomes_FIN_AN : (from dbNSFP) Total allele count in the Finnish gnomAD genome samples (1,747 samples)

gnomAD_genomes_NFE_AC : (from dbNSFP) Alternative allele count in the Non-Finnish European gnomAD genome samples (7,509 samples)

gnomAD_genomes_NFE_AF : (from dbNSFP) Alternative allele frequency in the Non-Finnish European gnomAD genome samples (7,509 samples)

gnomAD_genomes_NFE_AN : (from dbNSFP) Total allele count in the Non-Finnish European gnomAD genome samples (7,509 samples)

gnomAD_genomes_OTH_AC : (from dbNSFP) Alternative allele count in other gnomAD genome samples (491 samples)

gnomAD_genomes_OTH_AF : (from dbNSFP) Alternative allele frequency in other gnomAD genome samples (491 samples)

gnomAD_genomes_OTH_AN : (from dbNSFP) Total allele count in other gnomAD genome samples (491 samples)

integrated_confidence_value : (from dbNSFP) 0 - highly significant scores (approx. p<.003); 1 - significant scores (approx. p<.05); 2 - informative scores (approx. p<.25); 3 - other scores (approx. p>=.25).

integrated_fitCons_score : (from dbNSFP) fitCons score predicts the fraction of genomic positions belonging to a specific function class (defined by epigenomic "fingerprint") that are under selective pressure. Scores range from 0 to 1, with a larger score indicating a higher proportion of nucleic sites of the functional class the genomic position belong to are under selective pressure, therefore more likely to be functional important. Integrated (i6) scores are integrated across three cell types (GM12878, H1-hESC and HUVEC). More details can be found in doi:10.1038/ng.3196.

integrated_fitCons_score_rankscore : integrated_fitCons_score_rankscore from dbNSFP file

phastCons100way_vertebrate : (from dbNSFP) phastCons conservation score based on the multiple alignments of 100 vertebrate genomes (including human). The larger the score, the more conserved the site. Scores range from 0 to 1.

phastCons100way_vertebrate_rankscore : (from dbNSFP) phastCons100way_vertebrate scores were ranked among all phastCons100way_vertebrate scores in dbNSFP. The rankscore is the ratio of the rank of the score over the total number of phastCons100way_vertebrate scores in dbNSFP.

phastCons20way_mammalian : (from dbNSFP) phastCons conservation score based on the multiple alignments of 20 mammalian genomes (including human). The larger the score, the more conserved the site. Scores range from 0 to 1.

phastCons20way_mammalian_rankscore : (from dbNSFP) phastCons20way_mammalian scores were ranked among all phastCons20way_mammalian scores in dbNSFP. The rankscore is the ratio of the rank of the score over the total number of phastCons20way_mammalian scores in dbNSFP.

phyloP100way_vertebrate : (from dbNSFP) phyloP (phylogenetic p-values) conservation score based on the multiple alignments of 100 vertebrate genomes (including human). The larger the score, the more conserved the site. Scores range from -20.0 to 10.003 in dbNSFP.

phyloP100way_vertebrate_rankscore : (from dbNSFP) phyloP100way_vertebrate scores were ranked among all phyloP100way_vertebrate scores in dbNSFP. The rankscore is the ratio of the rank of the score over the total number of phyloP100way_vertebrate scores in dbNSFP.

phyloP20way_mammalian : (from dbNSFP) phyloP (phylogenetic p-values) conservation score based on the multiple alignments of 20 mammalian genomes (including human). The larger the score, the more conserved the site. Scores range from -13.282 to 1.199 in dbNSFP.

phyloP20way_mammalian_rankscore : (from dbNSFP) phyloP20way_mammalian scores were ranked among all phyloP20way_mammalian scores in dbNSFP. The rankscore is the ratio of the rank of the score over the total number of phyloP20way_mammalian scores in dbNSFP.

pos(1-based) : (from dbNSFP) physical position on the chromosome as to hg38 (1-based coordinate). For mitochondrial SNV, this position refers to the rCRS (GenBank: NC_012920).

ref : (from dbNSFP) reference nucleotide allele (as on the + strand)

refcodon : (from dbNSFP) reference codon

Condel : Consensus deleteriousness score for an amino acid substitution based on SIFT and PolyPhen-2

Uploaded_variation cDNA_position REF_ALLELE Allele Protein_position Amino_acids Codons SYMBOL EXON INTRON ZYG ConsequenceCLIN_SIG Existing_variation rs_dbSNP150 G2P_complete G2P_flag G2P_gene_req MMCNT1 MMCNT2 MMCNT3 MMID3 clinvar_rs clinvar_trait DOMAINS Interpro_domain hg18_chr hg18_pos(1-based) hg19_chr hg19_pos(1-based) ExAC_AC ExAC_AF gnomAD_exomes_AC gnomAD_exomes_AF gnomAD_genomes_AC gnomAD_genomes_AF Ensembl_geneid Ensembl_proteinid Ensembl_transcriptid HGVSc HGVSp Location Gene Feature Feature_type CDS_position IND ALLELE_NUM IMPACT DISTANCE STRAND FLAGS VARIANT_CLASS SYMBOL_SOURCE HGNC_ID BIOTYPE CANONICAL MANE CCDS ENSP SWISSPROT TREMBL UNIPARC REFSEQ_MATCH SOURCE SIFT PolyPhen HGVS_OFFSET AF AFR_AF AMR_AF EAS_AF EUR_AF SAS_AF AA_AF EA_AF gnomAD_AF gnomAD_AFR_AF gnomAD_AMR_AF gnomAD_ASJ_AF gnomAD_EAS_AF gnomAD_FIN_AF gnomAD_NFE_AF gnomAD_OTH_AF gnomAD_SAS_AF SOMATIC PHENO OverlapBP OverlapPC 1000Gp3_AC 1000Gp3_AF 1000Gp3_AFR_AC 1000Gp3_AFR_AF 1000Gp3_AMR_AC 1000Gp3_AMR_AF 1000Gp3_EAS_AC 1000Gp3_EAS_AF 1000Gp3_EUR_AC 1000Gp3_EUR_AF 1000Gp3_SAS_AC 1000Gp3_SAS_AF ALSPAC_AC ALSPAC_AF AltaiNeandertal Ancestral_allele CADD_phred CADD_raw CADD_raw_rankscore DANN_rankscore DANN_score Denisova ESP6500_AA_AC ESP6500_AA_AF ESP6500_EA_AC ESP6500_EA_AF Eigen-PC-phred Eigen-PC-raw Eigen-PC-raw_rankscore Eigen-phred Eigen-raw Eigen_coding_or_noncoding ExAC_AFR_AC ExAC_AFR_AF ExAC_AMR_AC ExAC_AMR_AF ExAC_Adj_AC ExAC_Adj_AF ExAC_EAS_AC ExAC_EAS_AF ExAC_FIN_AC ExAC_FIN_AF ExAC_NFE_AC ExAC_NFE_AF ExAC_SAS_AC ExAC_SAS_AF ExAC_nonTCGA_AC ExAC_nonTCGA_AF ExAC_nonTCGA_AFR_AC ExAC_nonTCGA_AFR_AF ExAC_nonTCGA_AMR_AC ExAC_nonTCGA_AMR_AF ExAC_nonTCGA_Adj_AC ExAC_nonTCGA_Adj_AF ExAC_nonTCGA_EAS_AC ExAC_nonTCGA_EAS_AF ExAC_nonTCGA_FIN_AC ExAC_nonTCGA_FIN_AF ExAC_nonTCGA_NFE_AC ExAC_nonTCGA_NFE_AF ExAC_nonTCGA_SAS_AC ExAC_nonTCGA_SAS_AF ExAC_nonpsych_AC ExAC_nonpsych_AF ExAC_nonpsych_AFR_AC ExAC_nonpsych_AFR_AF ExAC_nonpsych_AMR_AC ExAC_nonpsych_AMR_AF ExAC_nonpsych_Adj_AC ExAC_nonpsych_Adj_AFExAC_nonpsych_EAS_AC ExAC_nonpsych_EAS_AF ExAC_nonpsych_FIN_AC ExAC_nonpsych_FIN_AF ExAC_nonpsych_NFE_AC ExAC_nonpsych_NFE_AF ExAC_nonpsych_SAS_AC ExAC_nonpsych_SAS_AF FATHMM_converted_rankscore FATHMM_pred FATHMM_score GERP++_NR GERP++_RS GERP++_RS_rankscoreGM12878_confidence_value GM12878_fitCons_score GM12878_fitCons_score_rankscore GTEx_V6p_gene GTEx_V6p_tissue GenoCanyon_score GenoCanyon_score_rankscore H1-hESC_confidence_value H1-hESC_fitCons_score H1-hESC_fitCons_score_rankscore clinvar_clnsig clinvar_golden_stars HUVEC_confidence_value HUVEC_fitCons_score HUVEC_fitCons_score_rankscore LRT_Omega LRT_converted_rankscore LRT_pred LRT_score M-CAP_pred M-CAP_rankscore M-CAP_score MetaLR_pred MetaLR_rankscore MetaLR_score MetaSVM_pred MetaSVM_rankscore MetaSVM_score MutPred_AAchange MutPred_Top5features MutPred_protID MutPred_rankscore MutPred_score MutationAssessor_UniprotID MutationAssessor_pred MutationAssessor_score MutationAssessor_score_rankscore MutationAssessor_variant MutationTaster_AAE MutationTaster_converted_rankscore MutationTaster_model MutationTaster_pred MutationTaster_score PROVEAN_converted_rankscore PROVEAN_pred PROVEAN_score Polyphen2_HDIV_predPolyphen2_HDIV_rankscore Polyphen2_HDIV_score Polyphen2_HVAR_pred Polyphen2_HVAR_rankscore Polyphen2_HVAR_score REVEL_rankscore REVEL_scoreReliability_index SIFT_converted_rankscore SIFT_pred SIFT_score SiPhy_29way_logOdds SiPhy_29way_logOdds_rankscore SiPhy_29way_pi TWINSUK_AC TWINSUK_AF Transcript_id_VEST3 Transcript_var_VEST3 Uniprot_aapos_Polyphen2 Uniprot_acc_Polyphen2 Uniprot_id_Polyphen2VEST3_rankscore VEST3_score aaalt aapos aaref alt cds_strand chr codon_degeneracy codonpos fathmm-MKL_coding_group fathmm-MKL_coding_pred fathmm-MKL_coding_rankscore fathmm-MKL_coding_score genename gnomAD_exomes_AFR_AC gnomAD_exomes_AFR_AF gnomAD_exomes_AFR_AN gnomAD_exomes_AMR_AC gnomAD_exomes_AMR_AF gnomAD_exomes_AMR_AN gnomAD_exomes_AN gnomAD_exomes_ASJ_AC gnomAD_exomes_ASJ_AFgnomAD_exomes_ASJ_AN gnomAD_exomes_EAS_AC gnomAD_exomes_EAS_AF gnomAD_exomes_EAS_AN gnomAD_exomes_FIN_AC gnomAD_exomes_FIN_AF gnomAD_exomes_FIN_AN gnomAD_exomes_NFE_AC gnomAD_exomes_NFE_AF gnomAD_exomes_NFE_AN gnomAD_exomes_OTH_AC gnomAD_exomes_OTH_AF gnomAD_exomes_OTH_ANgnomAD_exomes_SAS_AC gnomAD_exomes_SAS_AF gnomAD_exomes_SAS_AN gnomAD_genomes_AFR_AC gnomAD_genomes_AFR_AF gnomAD_genomes_AFR_AN gnomAD_genomes_AMR_AC gnomAD_genomes_AMR_AF gnomAD_genomes_AMR_AN gnomAD_genomes_AN gnomAD_genomes_ASJ_AC gnomAD_genomes_ASJ_AF gnomAD_genomes_ASJ_AN gnomAD_genomes_EAS_AC gnomAD_genomes_EAS_AF gnomAD_genomes_EAS_AN gnomAD_genomes_FIN_AC gnomAD_genomes_FIN_AF gnomAD_genomes_FIN_AN gnomAD_genomes_NFE_AC gnomAD_genomes_NFE_AF gnomAD_genomes_NFE_AN gnomAD_genomes_OTH_AC gnomAD_genomes_OTH_AF gnomAD_genomes_OTH_AN integrated_confidence_value integrated_fitCons_score integrated_fitCons_score_rankscore phastCons100way_vertebrate phastCons100way_vertebrate_rankscorephastCons20way_mammalian phastCons20way_mammalian_rankscore phyloP100way_vertebrate phyloP100way_vertebrate_rankscore phyloP20way_mammalian phyloP20way_mammalian_rankscore pos(1-based) ref refcodon Condel

`

As you can see the output information about the mastermind in the info field "?" is shown. Previously, these lines contained information about what each column meant.

`## MMCNT1 : ?

MMCNT2 : ?

MMCNT3 : ?

MMID3 : ?`

at7 commented 4 years ago

I think the problem is that your Mastermind.pm is not up-to-date. How did you install the plugins? Can you please replace your Mastermind.pm file with the latest version from github: https://github.com/Ensembl/VEP_plugins/blob/release/97/Mastermind.pm Thank you.

shigej38 commented 4 years ago

I've updated ensembl vep and Mastermind.pm plugin. The command I used to update ensembl vep is: git pull git chechkout release/97 perl INSTALL.pl The command I used to update Mastermind: git clone https://github.com/Ensembl/VEP_plugins/blob/release/97/Mastermind.pm cd VEP_plugins cp Mastermind.pm /home/$USER/.vep/Plugins I updated the Mastermind database file by re-registering.

my problem still continues.

at7 commented 4 years ago

git clone https://github.com/Ensembl/VEP_plugins/blob/release/97/Mastermind.pm doesn't work for me. Can you please try: git clone https://github.com/Ensembl/VEP_plugins.git cp VEP_plugins/Mastermind.pm /home/$USER/.vep/Plugins/ Thank you

shigej38 commented 4 years ago

As you have already specified, I have used the "git clone" command with git clone to perform a single file operation. therefore ongoing commands do not work. because I copied the last file, it's not like that or the command I used is: git clone https://github.com/ensembl/vep_plugins.git the updates you mentioned did not solve my problem.

at7 commented 4 years ago

Could you please check that the Mastermind.pm file which you have stored under /home/$USER/.vep/Plugins contains the following 2 lines: https://github.com/Ensembl/VEP_plugins/blob/release/97/Mastermind.pm#L134-L135? Thank you

shigej38 commented 4 years ago

my mastermind.pm file content is line 131 to 138

`sub get_header_info {

return{ 'Mastermind_counts' => 'Mastermind number of citations in the medical literature. Output includes three unique counts: MMCNT1|MMCNT2|MMCNT3. MMCNT1 - Count of Mastermind articles with cDNA matches for this specific variant; MMCNT2 - Count of Mastermind articles with variants either explicitly matching at the cDNA level or given only at protein level; MMCNT3 - Count of Mastermind articles including other DNA-level variants resulting in the same amino acid change.', 'Mastermind_MMID3' => 'Mastermind MMID3 variant identifier(s), as gene:key, for MMCNT3.', };

}`

at7 commented 4 years ago

That looks good! What are you using in your --fields list? Maybe you need to update to Mastermind_counts and Mastermind_MMID3?

shigej38 commented 4 years ago

my fields command is: "Uploaded_variation,cDNA_position,REF_ALLELE,Allele,Protein_position,Amino_acids,Codons,SYMBOL,EXON,INTRON,ZYG,Consequence,CLIN_SIG,Existing_variation,rs_dbSNP150,G2P_complete,G2P_flag,G2P_gene_req,MMCNT1,MMCNT2,MMCNT3,MMID3,clinvar_rs,clinvar_trait,DOMAINS,Interpro_domain,hg18_chr,hg18_pos(1-based),hg19_chr,hg19_pos(1-based),ExAC_AC,ExAC_AF,gnomAD_exomes_AC,gnomAD_exomes_AF,gnomAD_genomes_AC,gnomAD_genomes_AF,Ensembl_geneid,Ensembl_proteinid,Ensembl_transcriptid,HGVSc,HGVSp,Location,Gene,Feature,Feature_type,CDS_position,IND,ALLELE_NUM,IMPACT,DISTANCE,STRAND,FLAGS,VARIANT_CLASS,SYMBOL_SOURCE,HGNC_ID,BIOTYPE,CANONICAL,MANE,CCDS,ENSP,SWISSPROT,TREMBL,UNIPARC,REFSEQ_MATCH,SOURCE,SIFT,PolyPhen,HGVS_OFFSET,AF,AFR_AF,AMR_AF,EAS_AF,EUR_AF,SAS_AF,AA_AF,EA_AF,gnomAD_AF,gnomAD_AFR_AF,gnomAD_AMR_AF,gnomAD_ASJ_AF,gnomAD_EAS_AF,gnomAD_FIN_AF,gnomAD_NFE_AF,gnomAD_OTH_AF,gnomAD_SAS_AF,SOMATIC,PHENO,OverlapBP,OverlapPC,1000Gp3_AC,1000Gp3_AF,1000Gp3_AFR_AC,1000Gp3_AFR_AF,1000Gp3_AMR_AC,1000Gp3_AMR_AF,1000Gp3_EAS_AC,1000Gp3_EAS_AF,1000Gp3_EUR_AC,1000Gp3_EUR_AF,1000Gp3_SAS_AC,1000Gp3_SAS_AF,ALSPAC_AC,ALSPAC_AF,AltaiNeandertal,Ancestral_allele,CADD_phred,CADD_raw,CADD_raw_rankscore,DANN_rankscore,DANN_score,Denisova,ESP6500_AA_AC,ESP6500_AA_AF,ESP6500_EA_AC,ESP6500_EA_AF,Eigen-PC-phred,Eigen-PC-raw,Eigen-PC-raw_rankscore,Eigen-phred,Eigen-raw,Eigen_coding_or_noncoding,ExAC_AFR_AC,ExAC_AFR_AF,ExAC_AMR_AC,ExAC_AMR_AF,ExAC_Adj_AC,ExAC_Adj_AF,ExAC_EAS_AC,ExAC_EAS_AF,ExAC_FIN_AC,ExAC_FIN_AF,ExAC_NFE_AC,ExAC_NFE_AF,ExAC_SAS_AC,ExAC_SAS_AF,ExAC_nonTCGA_AC,ExAC_nonTCGA_AF,ExAC_nonTCGA_AFR_AC,ExAC_nonTCGA_AFR_AF,ExAC_nonTCGA_AMR_AC,ExAC_nonTCGA_AMR_AF,ExAC_nonTCGA_Adj_AC,ExAC_nonTCGA_Adj_AF,ExAC_nonTCGA_EAS_AC,ExAC_nonTCGA_EAS_AF,ExAC_nonTCGA_FIN_AC,ExAC_nonTCGA_FIN_AF,ExAC_nonTCGA_NFE_AC,ExAC_nonTCGA_NFE_AF,ExAC_nonTCGA_SAS_AC,ExAC_nonTCGA_SAS_AF,ExAC_nonpsych_AC,ExAC_nonpsych_AF,ExAC_nonpsych_AFR_AC,ExAC_nonpsych_AFR_AF,ExAC_nonpsych_AMR_AC,ExAC_nonpsych_AMR_AF,ExAC_nonpsych_Adj_AC,ExAC_nonpsych_Adj_AF,ExAC_nonpsych_EAS_AC,ExAC_nonpsych_EAS_AF,ExAC_nonpsych_FIN_AC,ExAC_nonpsych_FIN_AF,ExAC_nonpsych_NFE_AC,ExAC_nonpsych_NFE_AF,ExAC_nonpsych_SAS_AC,ExAC_nonpsych_SAS_AF,FATHMM_converted_rankscore,FATHMM_pred,FATHMM_score,GERP++_NR,GERP++_RS,GERP++_RS_rankscore,GM12878_confidence_value,GM12878_fitCons_score,GM12878_fitCons_score_rankscore,GTEx_V6p_gene,GTEx_V6p_tissue,GenoCanyon_score,GenoCanyon_score_rankscore,H1-hESC_confidence_value,H1-hESC_fitCons_score,H1-hESC_fitCons_score_rankscore,clinvar_clnsig,clinvar_golden_stars,HUVEC_confidence_value,HUVEC_fitCons_score,HUVEC_fitCons_score_rankscore,LRT_Omega,LRT_converted_rankscore,LRT_pred,LRT_score,M-CAP_pred,M-CAP_rankscore,M-CAP_score,MetaLR_pred,MetaLR_rankscore,MetaLR_score,MetaSVM_pred,MetaSVM_rankscore,MetaSVM_score,MutPred_AAchange,MutPred_Top5features,MutPred_protID,MutPred_rankscore,MutPred_score,MutationAssessor_UniprotID,MutationAssessor_pred,MutationAssessor_score,MutationAssessor_score_rankscore,MutationAssessor_variant,MutationTaster_AAE,MutationTaster_converted_rankscore,MutationTaster_model,MutationTaster_pred,MutationTaster_score,PROVEAN_converted_rankscore,PROVEAN_pred,PROVEAN_score,Polyphen2_HDIV_pred,Polyphen2_HDIV_rankscore,Polyphen2_HDIV_score,Polyphen2_HVAR_pred,Polyphen2_HVAR_rankscore,Polyphen2_HVAR_score,REVEL_rankscore,REVEL_score,Reliability_index,SIFT_converted_rankscore,SIFT_pred,SIFT_score,SiPhy_29way_logOdds,SiPhy_29way_logOdds_rankscore,SiPhy_29way_pi,TWINSUK_AC,TWINSUK_AF,Transcript_id_VEST3,Transcript_var_VEST3,Uniprot_aapos_Polyphen2,Uniprot_acc_Polyphen2,Uniprot_id_Polyphen2,VEST3_rankscore,VEST3_score,aaalt,aapos,aaref,alt,cds_strand,chr,codon_degeneracy,codonpos,fathmm-MKL_coding_group,fathmm-MKL_coding_pred,fathmm-MKL_coding_rankscore,fathmm-MKL_coding_score,genename,gnomAD_exomes_AFR_AC,gnomAD_exomes_AFR_AF,gnomAD_exomes_AFR_AN,gnomAD_exomes_AMR_AC,gnomAD_exomes_AMR_AF,gnomAD_exomes_AMR_AN,gnomAD_exomes_AN,gnomAD_exomes_ASJ_AC,gnomAD_exomes_ASJ_AF,gnomAD_exomes_ASJ_AN,gnomAD_exomes_EAS_AC,gnomAD_exomes_EAS_AF,gnomAD_exomes_EAS_AN,gnomAD_exomes_FIN_AC,gnomAD_exomes_FIN_AF,gnomAD_exomes_FIN_AN,gnomAD_exomes_NFE_AC,gnomAD_exomes_NFE_AF,gnomAD_exomes_NFE_AN,gnomAD_exomes_OTH_AC,gnomAD_exomes_OTH_AF,gnomAD_exomes_OTH_AN,gnomAD_exomes_SAS_AC,gnomAD_exomes_SAS_AF,gnomAD_exomes_SAS_AN,gnomAD_genomes_AFR_AC,gnomAD_genomes_AFR_AF,gnomAD_genomes_AFR_AN,gnomAD_genomes_AMR_AC,gnomAD_genomes_AMR_AF,gnomAD_genomes_AMR_AN,gnomAD_genomes_AN,gnomAD_genomes_ASJ_AC,gnomAD_genomes_ASJ_AF,gnomAD_genomes_ASJ_AN,gnomAD_genomes_EAS_AC,gnomAD_genomes_EAS_AF,gnomAD_genomes_EAS_AN,gnomAD_genomes_FIN_AC,gnomAD_genomes_FIN_AF,gnomAD_genomes_FIN_AN,gnomAD_genomes_NFE_AC,gnomAD_genomes_NFE_AF,gnomAD_genomes_NFE_AN,gnomAD_genomes_OTH_AC,gnomAD_genomes_OTH_AF,gnomAD_genomes_OTH_AN,integrated_confidence_value,integrated_fitCons_score,integrated_fitCons_score_rankscore,phastCons100way_vertebrate,phastCons100way_vertebrate_rankscore,phastCons20way_mammalian,phastCons20way_mammalian_rankscore,phyloP100way_vertebrate,phyloP100way_vertebrate_rankscore,phyloP20way_mammalian,phyloP20way_mammalian_rankscore,pos(1-based),ref,refcodon,Condel"

at7 commented 4 years ago

That must be it. You still use MMCNT1,MMCNT2,MMCNT3,MMID3 in your fields list. Can you replace those please with _Mastermind_counts,MastermindMMID3? Thank you!

shigej38 commented 4 years ago

@at7 Thank You Sir yes, this method solved my problem. but asking for a little customize. can I write the names of the columns I want with this method?