Closed matmu closed 5 years ago
Dear @matmu,
Can you send use the header and a couple of lines of your VCF file, please ?
I just tried with the following VCF file:
##fileformat=VCFv4.2
##VEP="v96" time="2019-04-04 08:04:16" cache="/opt/vep/.vep/homo_sapiens/94_GRCh38" ensembl-funcgen=96.9c3a0cd ensembl=96.7a35428 ensembl-io=96.6e65b30 ensembl-variation=96.db44614 1000genomes="phase3" COSMIC="86" ClinVar="201807" ESP="V2-SSA137" HGMD-PUBLIC="20174" assembly="GRCh38.p12" dbSNP="151" gencode="GENCODE 29" genebuild="2014-07" gnomAD="170228" polyphen="2.2.2" regbuild="16" sift="sift5.2.2"
##INFO=<ID=CSQ,Number=.,Type=String,Description="Consequence annotations from Ensembl VEP. Format: Allele|Consequence|IMPACT|SYMBOL|Gene|Feature_type|Feature|BIOTYPE|EXON|INTRON|HGVSc|HGVSp|cDNA_position|CDS_position|Protein_position|Amino_acids|Codons|Existing_variation|DISTANCE|STRAND|FLAGS|SYMBOL_SOURCE|HGNC_ID">
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT patient
1 1748780 rs1014988 G A . . CSQ=A|upstream_gene_variant|MODIFIER|SLC35E2A|ENSG00000215790|Transcript|ENST00000246421|processed_transcript|||||||||||2788|-1||HGNC|HGNC:20863,A|downstream_gene_variant|MODIFIER|NADK|ENSG00000008130|Transcript|ENST00000341426|protein_coding|||||||||||2452|-1||HGNC|HGNC:29831,A|downstream_gene_variant|MODIFIER|NADK|ENSG00000008130|Transcript|ENST00000341991|protein_coding|||||||||||2452|-1||HGNC|HGNC:29831,A|downstream_gene_variant|MODIFIER|NADK|ENSG00000008130|Transcript|ENST00000342348|protein_coding|||||||||||3691|-1||HGNC|HGNC:29831,A|upstream_gene_variant|MODIFIER|SLC35E2A|ENSG00000215790|Transcript|ENST00000355439|protein_coding|||||||||||2781|-1||HGNC|HGNC:20863,A|downstream_gene_variant|MODIFIER|NADK|ENSG00000008130|Transcript|ENST00000378625|protein_coding|||||||||||2452|-1||HGNC|HGNC:29831,A|downstream_gene_variant|MODIFIER|NADK|ENSG00000008130|Transcript|ENST00000498806|nonsense_mediated_decay|||||||||||4195|-1|cds_start_NF|HGNC|HGNC:29831,A|upstream_gene_variant|MODIFIER|SLC35E2A|ENSG00000215790|Transcript|ENST00000643905|protein_coding|||||||||||2781|-1||HGNC|HGNC:20863,A|upstream_gene_variant|MODIFIER|SLC35E2A|ENSG00000215790|Transcript|ENST00000647043|processed_transcript|||||||||||2861|-1||HGNC|HGNC:20863 GT 1|0
1 2401592 rs3001336 G A . . CSQ=A|downstream_gene_variant|MODIFIER|PEX10|ENSG00000157911|Transcript|ENST00000288774|protein_coding|||||||||||2372|-1||HGNC|HGNC:8851,A|intron_variant|MODIFIER|RER1|ENSG00000157916|Transcript|ENST00000306256|protein_coding||6/7||||||||||1|cds_end_NF|HGNC|HGNC:30309,A|intron_variant|MODIFIER|RER1|ENSG00000157916|Transcript|ENST00000378512|protein_coding||5/6||||||||||1||HGNC|HGNC:30309,A|intron_variant|MODIFIER|RER1|ENSG00000157916|Transcript|ENST00000378513|protein_coding||4/5||||||||||1||HGNC|HGNC:30309,A|intron_variant|MODIFIER|RER1|ENSG00000157916|Transcript|ENST00000378518|protein_coding||4/4||||||||||1||HGNC|HGNC:30309,A|intron_variant|MODIFIER|RER1|ENSG00000157916|Transcript|ENST00000443438|protein_coding||5/5||||||||||1|cds_end_NF|HGNC|HGNC:30309,A|downstream_gene_variant|MODIFIER|PEX10|ENSG00000157911|Transcript|ENST00000447513|protein_coding|||||||||||3227|-1||HGNC|HGNC:8851,A|upstream_gene_variant|MODIFIER|RER1|ENSG00000157916|Transcript|ENST00000462129|retained_intron|||||||||||359|1||HGNC|HGNC:30309,A|intron_variant|MODIFIER|RER1|ENSG00000157916|Transcript|ENST00000488353|protein_coding||4/5||||||||||1||HGNC|HGNC:30309,A|downstream_gene_variant|MODIFIER|PEX10|ENSG00000157911|Transcript|ENST00000507596|protein_coding|||||||||||3246|-1||HGNC|HGNC:8851,A|downstream_gene_variant|MODIFIER|PEX10|ENSG00000157911|Transcript|ENST00000510434|nonsense_mediated_decay|||||||||||4926|-1||HGNC|HGNC:8851,A|intron_variant|MODIFIER|RER1|ENSG00000157916|Transcript|ENST00000605895|protein_coding||5/6||||||||||1||HGNC|HGNC:30309,A|downstream_gene_variant|MODIFIER|PEX10|ENSG00000157911|Transcript|ENST00000650293|protein_coding|||||||||||2469|-1|cds_start_NF|HGNC|HGNC:8851 GT 0|0
and I got the content of the CSQ field (and the other VCF fields) when I run the command:
filter_vep -i /opt/vep/.vep/output/test_output.vcf.gz --list --vcf_info_field CSQ
ALT
Allele
Amino_acids
BIOTYPE
CDS_position
CHROM
Codons
Consequence
DISTANCE
EXON
Existing_variation
FILTER
FLAGS
FORMAT
Feature
Feature_type
Gene
HGNC_ID
HGVSc
HGVSp
ID
IMPACT
INFO
INTRON
POS
Protein_position
QUAL
REF
STRAND
SYMBOL
SYMBOL_SOURCE
cDNA_position
patient
Best regards, Laurent
Dear @ens-lgil,
please find the vcf below. I have deleted some rows from the header, but it still behaves the same way.
Best regards, Matthias
##fileformat=VCFv4.2
##FILTER=<ID=PASS,Description="All filters passed">
##ALT=<ID=NON_REF,Description="Represents any possible alternative allele at this location">
##FILTER=<ID=LowQual,Description="Low quality">
##FORMAT=<ID=AD,Number=R,Type=Integer,Description="Allelic depths for the ref and alt alleles in the order listed">
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Approximate read depth (reads with MQ=255 or with bad mates are filtered)">
##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=PGT,Number=1,Type=String,Description="Physical phasing haplotype information, describing how the alternate alleles are phased in relation to one another">
##FORMAT=<ID=PID,Number=1,Type=String,Description="Physical phasing ID information, where each unique ID within a given sample (but not across samples) connects records within a phasing group">
##FORMAT=<ID=PL,Number=G,Type=Integer,Description="Normalized, Phred-scaled likelihoods for genotypes as defined in the VCF specification">
##FORMAT=<ID=RGQ,Number=1,Type=Integer,Description="Unconditional reference genotype confidence, encoded as a phred quality -10*log10 p(genotype call is wrong)">
##FORMAT=<ID=SB,Number=4,Type=Integer,Description="Per-sample component statistics which comprise the Fisher's Exact Test to detect strand bias.">
##INFO=<ID=AC,Number=A,Type=Integer,Description="Allele count in genotypes, for each ALT allele, in the same order as listed">
##INFO=<ID=AF,Number=A,Type=Float,Description="Allele Frequency, for each ALT allele, in the same order as listed">
##INFO=<ID=AN,Number=1,Type=Integer,Description="Total number of alleles in called genotypes">
##INFO=<ID=BaseQRankSum,Number=1,Type=Float,Description="Z-score from Wilcoxon rank sum test of Alt Vs. Ref base qualities">
##INFO=<ID=ClippingRankSum,Number=1,Type=Float,Description="Z-score From Wilcoxon rank sum test of Alt vs. Ref number of hard clipped bases">
##INFO=<ID=DB,Number=0,Type=Flag,Description="dbSNP Membership">
##INFO=<ID=DP,Number=1,Type=Integer,Description="Approximate read depth; some reads may have been filtered">
##INFO=<ID=DS,Number=0,Type=Flag,Description="Were any of the samples downsampled?">
##INFO=<ID=END,Number=1,Type=Integer,Description="Stop position of the interval">
##INFO=<ID=ExcessHet,Number=1,Type=Float,Description="Phred-scaled p-value for exact test of excess heterozygosity">
##INFO=<ID=FS,Number=1,Type=Float,Description="Phred-scaled p-value using Fisher's exact test to detect strand bias">
##INFO=<ID=HaplotypeScore,Number=1,Type=Float,Description="Consistency of the site with at most two segregating haplotypes">
##INFO=<ID=InbreedingCoeff,Number=1,Type=Float,Description="Inbreeding coefficient as estimated from the genotype likelihoods per-sample when compared against the Hardy-Weinberg expectation">
##INFO=<ID=MLEAC,Number=A,Type=Integer,Description="Maximum likelihood expectation (MLE) for the allele counts (not necessarily the same as the AC), for each ALT allele, in the same order as listed">
##INFO=<ID=MLEAF,Number=A,Type=Float,Description="Maximum likelihood expectation (MLE) for the allele frequency (not necessarily the same as the AF), for each ALT allele, in the same order as listed">
##INFO=<ID=MQ,Number=1,Type=Float,Description="RMS Mapping Quality">
##INFO=<ID=MQRankSum,Number=1,Type=Float,Description="Z-score From Wilcoxon rank sum test of Alt vs. Ref read mapping qualities">
##INFO=<ID=NEGATIVE_TRAIN_SITE,Number=0,Type=Flag,Description="This variant was used to build the negative training set of bad variants">
##INFO=<ID=POSITIVE_TRAIN_SITE,Number=0,Type=Flag,Description="This variant was used to build the positive training set of good variants">
##INFO=<ID=QD,Number=1,Type=Float,Description="Variant Confidence/Quality by Depth">
##INFO=<ID=RAW_MQ,Number=1,Type=Float,Description="Raw data for RMS Mapping Quality">
##INFO=<ID=ReadPosRankSum,Number=1,Type=Float,Description="Z-score from Wilcoxon rank sum test of Alt vs. Ref read position bias">
##INFO=<ID=SOR,Number=1,Type=Float,Description="Symmetric Odds Ratio of 2x2 contingency table to detect strand bias">
##INFO=<ID=VQSLOD,Number=1,Type=Float,Description="Log odds of being a true variant versus being false under the trained gaussian mixture model">
##INFO=<ID=culprit,Number=1,Type=String,Description="The annotation which was the worst performing in the Gaussian mixture model, likely the reason why the variant was filtered out">
##contig=<ID=chr1,length=248956422>
##contig=<ID=chr2,length=242193529>
##contig=<ID=chr3,length=198295559>
##contig=<ID=chr4,length=190214555>
##contig=<ID=chr5,length=181538259>
##contig=<ID=chr6,length=170805979>
##contig=<ID=chr7,length=159345973>
##contig=<ID=chr8,length=145138636>
##contig=<ID=chr9,length=138394717>
##contig=<ID=chr10,length=133797422>
##contig=<ID=chr11,length=135086622>
##contig=<ID=chr12,length=133275309>
##contig=<ID=chr13,length=114364328>
##contig=<ID=chr14,length=107043718>
##contig=<ID=chr15,length=101991189>
##contig=<ID=chr16,length=90338345>
##contig=<ID=chr17,length=83257441>
##contig=<ID=chr18,length=80373285>
##contig=<ID=chr19,length=58617616>
##contig=<ID=chr20,length=64444167>
##contig=<ID=chr21,length=46709983>
##contig=<ID=chr22,length=50818468>
##contig=<ID=chrX,length=156040895>
##contig=<ID=chrY,length=57227415>
##contig=<ID=chrM,length=16569>
##source=SelectVariants
##VEP="v95" time="2019-03-27 15:46:19" cache="/data/icg_munz/.vep/homo_sapiens_merged/95_GRCh38" ensembl-funcgen=95.94439f4 ensembl=95.4f83453 ensembl-io=95.78ccac5 ensembl-variation=95.858de3e 1000genomes="phase3" COSMIC="86" ClinVar="201810" ESP="V2-SSA137" HGMD-PUBLIC="20174" assembly="GRCh38.p12" dbSNP="151" gencode="GENCODE 29" genebuild="2014-07" gnomAD="170228" polyphen="2.2.2" refseq="2018-07-10 14:50:52 - GCF_000001405.38_GRCh38.p12_genomic.gff" regbuild="1.0" sift="sift5.2.2"
##INFO=<ID=CSQ,Number=.,Type=String,Description="Consequence annotations from Ensembl VEP. Format: Allele|Consequence|IMPACT|SYMBOL|Gene|Feature_type|Feature|BIOTYPE|EXON|INTRON|HGVSc|HGVSp|cDNA_position|CDS_position|Protein_position|Amino_acids|Codons|Existing_variation|DISTANCE|STRAND|FLAGS|VARIANT_CLASS|SYMBOL_SOURCE|HGNC_ID|CANONICAL|TSL|APPRIS|CCDS|ENSP|SWISSPROT|TREMBL|UNIPARC|REFSEQ_MATCH|SOURCE|GIVEN_REF|USED_REF|BAM_EDIT|GENE_PHENO|NEAREST|SIFT|PolyPhen|DOMAINS|HGVS_OFFSET|AF|AFR_AF|AMR_AF|EAS_AF|EUR_AF|SAS_AF|AA_AF|EA_AF|gnomAD_AF|gnomAD_AFR_AF|gnomAD_AMR_AF|gnomAD_ASJ_AF|gnomAD_EAS_AF|gnomAD_FIN_AF|gnomAD_NFE_AF|gnomAD_OTH_AF|gnomAD_SAS_AF|MAX_AF|MAX_AF_POPS|CLIN_SIG|SOMATIC|PHENO|PUBMED|MOTIF_NAME|MOTIF_POS|HIGH_INF_POS|MOTIF_SCORE_CHANGE|CADD_PHRED|CADD_RAW|#chr|1000Gp3_AC|1000Gp3_AF|1000Gp3_AFR_AC|1000Gp3_AFR_AF|1000Gp3_AMR_AC|1000Gp3_AMR_AF|1000Gp3_EAS_AC|1000Gp3_EAS_AF|1000Gp3_EUR_AC|1000Gp3_EUR_AF|1000Gp3_SAS_AC|1000Gp3_SAS_AF|29way_logOdds|29way_logOdds_rankscore|29way_pi|ALSPAC_AC|ALSPAC_AF|APPRIS|Aloft_Confidence|Aloft_Fraction_transcripts_affected|Aloft_pred|Aloft_prob_Dominant|Aloft_prob_Recessive|Aloft_prob_Tolerant|AltaiNeandertal|Ancestral_allele|CADD_phred|CADD_raw|CADD_raw_rankscore|DANN_rankscore|DANN_score|DEOGEN2_pred|DEOGEN2_rankscore|DEOGEN2_score|Denisova|ESP6500_AA_AC|ESP6500_AA_AF|ESP6500_EA_AC|ESP6500_EA_AF|Eigen-PC-phred_coding|Eigen-PC-raw_coding|Eigen-PC-raw_coding_rankscore|Eigen-pred_coding|Eigen-raw_coding|Eigen-raw_coding_rankscore|Ensembl_geneid|Ensembl_proteinid|Ensembl_transcriptid|ExAC_AC|ExAC_AF|ExAC_AFR_AC|ExAC_AFR_AF|ExAC_AMR_AC|ExAC_AMR_AF|ExAC_Adj_AC|ExAC_Adj_AF|ExAC_EAS_AC|ExAC_EAS_AF|ExAC_FIN_AC|ExAC_FIN_AF|ExAC_NFE_AC|ExAC_NFE_AF|ExAC_SAS_AC|ExAC_SAS_AF|ExAC_nonTCGA_AC|ExAC_nonTCGA_AF|ExAC_nonTCGA_AFR_AC|ExAC_nonTCGA_AFR_AF|ExAC_nonTCGA_AMR_AC|ExAC_nonTCGA_AMR_AF|ExAC_nonTCGA_Adj_AC|ExAC_nonTCGA_Adj_AF|ExAC_nonTCGA_EAS_AC|ExAC_nonTCGA_EAS_AF|ExAC_nonTCGA_FIN_AC|ExAC_nonTCGA_FIN_AF|ExAC_nonTCGA_NFE_AC|ExAC_nonTCGA_NFE_AF|ExAC_nonTCGA_SAS_AC|ExAC_nonTCGA_SAS_AF|ExAC_nonpsych_AC|ExAC_nonpsych_AF|ExAC_nonpsych_AFR_AC|ExAC_nonpsych_AFR_AF|ExAC_nonpsych_AMR_AC|ExAC_nonpsych_AMR_AF|ExAC_nonpsych_Adj_AC|ExAC_nonpsych_Adj_AF|ExAC_nonpsych_EAS_AC|ExAC_nonpsych_EAS_AF|ExAC_nonpsych_FIN_AC|ExAC_nonpsych_FIN_AF|ExAC_nonpsych_NFE_AC|ExAC_nonpsych_NFE_AF|ExAC_nonpsych_SAS_AC|ExAC_nonpsych_SAS_AF|FATHMM_converted_rankscore|FATHMM_pred|FATHMM_score|GENCODE_basic|GERP++_NR|GERP++_RS|GERP++_RS_rankscore|GM12878_confidence_value|GM12878_fitCons_rankscore|GM12878_fitCons_score|GTEx_V7_gene|GTEx_V7_tissue|GenoCanyon_rankscore|GenoCanyon_score|Geuvadis_eQTL_target_gene|H1-hESC_confidence_value|H1-hESC_fitCons_rankscore|H1-hESC_fitCons_score|HUVEC_confidence_value|HUVEC_fitCons_rankscore|HUVEC_fitCons_score|Interpro_domain|LINSIGHT|LINSIGHT_rankscore|LRT_Omega|LRT_converted_rankscore|LRT_pred|LRT_score|M-CAP_pred|M-CAP_rankscore|M-CAP_score|MPC_rankscore|MPC_score|MVP_rankscore|MVP_score|MetaLR_pred|MetaLR_rankscore|MetaLR_score|MetaSVM_pred|MetaSVM_rankscore|MetaSVM_score|MutPred_AAchange|MutPred_Top5features|MutPred_protID|MutPred_rankscore|MutPred_score|MutationAssessor_pred|MutationAssessor_rankscore|MutationAssessor_score|MutationTaster_AAE|MutationTaster_converted_rankscore|MutationTaster_model|MutationTaster_pred|MutationTaster_score|PROVEAN_converted_rankscore|PROVEAN_pred|PROVEAN_score|Polyphen2_HDIV_pred|Polyphen2_HDIV_rankscore|Polyphen2_HDIV_score|Polyphen2_HVAR_pred|Polyphen2_HVAR_rankscore|Polyphen2_HVAR_score|PrimateAI_pred|PrimateAI_rankscore|PrimateAI_score|REVEL_rankscore|REVEL_score|Reliability_index|SIFT4G_converted_rankscore|SIFT4G_pred|SIFT4G_score|SIFT_converted_rankscore|SIFT_pred|SIFT_score|TSL|TWINSUK_AC|TWINSUK_AF|UK10K_AC|UK10K_AF|Uniprot_acc|Uniprot_entry|VEP_canonical|VEST4_rankscore|VEST4_score|VindijiaNeandertal|aaalt|aapos|aaref|alt|bStatistic|bStatistic_rankscore|cds_strand|clinvar_clnsig|clinvar_hgvs|clinvar_review|clinvar_rs|clinvar_trait|clinvar_var_source|codon_degeneracy|codonpos|fathmm-MKL_coding_group|fathmm-MKL_coding_pred|fathmm-MKL_coding_rankscore|fathmm-MKL_coding_score|fathmm-XF_coding_pred|fathmm-XF_coding_rankscore|fathmm-XF_coding_score|genename|gnomAD_exomes_AC|gnomAD_exomes_AF|gnomAD_exomes_AFR_AC|gnomAD_exomes_AFR_AF|gnomAD_exomes_AFR_AN|gnomAD_exomes_AFR_nhomalt|gnomAD_exomes_AMR_AC|gnomAD_exomes_AMR_AF|gnomAD_exomes_AMR_AN|gnomAD_exomes_AMR_nhomalt|gnomAD_exomes_AN|gnomAD_exomes_ASJ_AC|gnomAD_exomes_ASJ_AF|gnomAD_exomes_ASJ_AN|gnomAD_exomes_ASJ_nhomalt|gnomAD_exomes_EAS_AC|gnomAD_exomes_EAS_AF|gnomAD_exomes_EAS_AN|gnomAD_exomes_EAS_nhomalt|gnomAD_exomes_FIN_AC|gnomAD_exomes_FIN_AF|gnomAD_exomes_FIN_AN|gnomAD_exomes_FIN_nhomalt|gnomAD_exomes_NFE_AC|gnomAD_exomes_NFE_AF|gnomAD_exomes_NFE_AN|gnomAD_exomes_NFE_nhomalt|gnomAD_exomes_POPMAX_AC|gnomAD_exomes_POPMAX_AF|gnomAD_exomes_POPMAX_AN|gnomAD_exomes_POPMAX_nhomalt|gnomAD_exomes_SAS_AC|gnomAD_exomes_SAS_AF|gnomAD_exomes_SAS_AN|gnomAD_exomes_SAS_nhomalt|gnomAD_exomes_controls_AC|gnomAD_exomes_controls_AF|gnomAD_exomes_controls_AFR_AC|gnomAD_exomes_controls_AFR_AF|gnomAD_exomes_controls_AFR_AN|gnomAD_exomes_controls_AFR_nhomalt|gnomAD_exomes_controls_AMR_AC|gnomAD_exomes_controls_AMR_AF|gnomAD_exomes_controls_AMR_AN|gnomAD_exomes_controls_AMR_nhomalt|gnomAD_exomes_controls_AN|gnomAD_exomes_controls_ASJ_AC|gnomAD_exomes_controls_ASJ_AF|gnomAD_exomes_controls_ASJ_AN|gnomAD_exomes_controls_ASJ_nhomalt|gnomAD_exomes_controls_EAS_AC|gnomAD_exomes_controls_EAS_AF|gnomAD_exomes_controls_EAS_AN|gnomAD_exomes_controls_EAS_nhomalt|gnomAD_exomes_controls_FIN_AC|gnomAD_exomes_controls_FIN_AF|gnomAD_exomes_controls_FIN_AN|gnomAD_exomes_controls_FIN_nhomalt|gnomAD_exomes_controls_NFE_AC|gnomAD_exomes_controls_NFE_AF|gnomAD_exomes_controls_NFE_AN|gnomAD_exomes_controls_NFE_nhomalt|gnomAD_exomes_controls_POPMAX_AC|gnomAD_exomes_controls_POPMAX_AF|gnomAD_exomes_controls_POPMAX_AN|gnomAD_exomes_controls_POPMAX_nhomalt|gnomAD_exomes_controls_SAS_AC|gnomAD_exomes_controls_SAS_AF|gnomAD_exomes_controls_SAS_AN|gnomAD_exomes_controls_SAS_nhomalt|gnomAD_exomes_controls_nhomalt|gnomAD_exomes_flag|gnomAD_exomes_nhomalt|gnomAD_genomes_AC|gnomAD_genomes_AF|gnomAD_genomes_AFR_AC|gnomAD_genomes_AFR_AF|gnomAD_genomes_AFR_AN|gnomAD_genomes_AFR_nhomalt|gnomAD_genomes_AMR_AC|gnomAD_genomes_AMR_AF|gnomAD_genomes_AMR_AN|gnomAD_genomes_AMR_nhomalt|gnomAD_genomes_AN|gnomAD_genomes_ASJ_AC|gnomAD_genomes_ASJ_AF|gnomAD_genomes_ASJ_AN|gnomAD_genomes_ASJ_nhomalt|gnomAD_genomes_EAS_AC|gnomAD_genomes_EAS_AF|gnomAD_genomes_EAS_AN|gnomAD_genomes_EAS_nhomalt|gnomAD_genomes_FIN_AC|gnomAD_genomes_FIN_AF|gnomAD_genomes_FIN_AN|gnomAD_genomes_FIN_nhomalt|gnomAD_genomes_NFE_AC|gnomAD_genomes_NFE_AF|gnomAD_genomes_NFE_AN|gnomAD_genomes_NFE_nhomalt|gnomAD_genomes_POPMAX_AC|gnomAD_genomes_POPMAX_AF|gnomAD_genomes_POPMAX_AN|gnomAD_genomes_POPMAX_nhomalt|gnomAD_genomes_controls_AC|gnomAD_genomes_controls_AF|gnomAD_genomes_controls_AFR_AC|gnomAD_genomes_controls_AFR_AF|gnomAD_genomes_controls_AFR_AN|gnomAD_genomes_controls_AFR_nhomalt|gnomAD_genomes_controls_AMR_AC|gnomAD_genomes_controls_AMR_AF|gnomAD_genomes_controls_AMR_AN|gnomAD_genomes_controls_AMR_nhomalt|gnomAD_genomes_controls_AN|gnomAD_genomes_controls_ASJ_AC|gnomAD_genomes_controls_ASJ_AF|gnomAD_genomes_controls_ASJ_AN|gnomAD_genomes_controls_ASJ_nhomalt|gnomAD_genomes_controls_EAS_AC|gnomAD_genomes_controls_EAS_AF|gnomAD_genomes_controls_EAS_AN|gnomAD_genomes_controls_EAS_nhomalt|gnomAD_genomes_controls_FIN_AC|gnomAD_genomes_controls_FIN_AF|gnomAD_genomes_controls_FIN_AN|gnomAD_genomes_controls_FIN_nhomalt|gnomAD_genomes_controls_NFE_AC|gnomAD_genomes_controls_NFE_AF|gnomAD_genomes_controls_NFE_AN|gnomAD_genomes_controls_NFE_nhomalt|gnomAD_genomes_controls_POPMAX_AC|gnomAD_genomes_controls_POPMAX_AF|gnomAD_genomes_controls_POPMAX_AN|gnomAD_genomes_controls_POPMAX_nhomalt|gnomAD_genomes_controls_nhomalt|gnomAD_genomes_flag|gnomAD_genomes_nhomalt|hg18_chr|hg18_pos(1-based)|hg19_chr|hg19_pos(1-based)|integrated_confidence_value|integrated_fitCons_rankscore|integrated_fitCons_score|phastCons100way_vertebrate|phastCons100way_vertebrate_rankscore|phastCons17way_primate|phastCons17way_primate_rankscore|phastCons30way_mammalian|phastCons30way_mammalian_rankscore|phyloP100way_vertebrate|phyloP100way_vertebrate_rankscore|phyloP17way_primate|phyloP17way_primate_rankscore|phyloP30way_mammalian|phyloP30way_mammalian_rankscore|pos(1-coor)|ref|refcodon|rs_dbSNP151">
##CADD_PHRED=PHRED-like scaled CADD score
##CADD_RAW=Raw CADD score
###chr=#chr from dbNSFP file
##1000Gp3_AC=(from dbNSFP) Alternative allele counts in the whole 1000 genomes phase 3 (1000Gp3) data.
##1000Gp3_AF=(from dbNSFP) Alternative allele frequency in the whole 1000Gp3 data.
##1000Gp3_AFR_AC=(from dbNSFP) Alternative allele counts in the 1000Gp3 African descendent samples.
##1000Gp3_AFR_AF=(from dbNSFP) Alternative allele frequency in the 1000Gp3 African descendent samples.
##1000Gp3_AMR_AC=(from dbNSFP) Alternative allele counts in the 1000Gp3 American descendent samples.
##1000Gp3_AMR_AF=(from dbNSFP) Alternative allele frequency in the 1000Gp3 American descendent samples.
##1000Gp3_EAS_AC=(from dbNSFP) Alternative allele counts in the 1000Gp3 East Asian descendent samples.
##1000Gp3_EAS_AF=(from dbNSFP) Alternative allele frequency in the 1000Gp3 East Asian descendent samples.
##1000Gp3_EUR_AC=(from dbNSFP) Alternative allele counts in the 1000Gp3 European descendent samples.
##1000Gp3_EUR_AF=(from dbNSFP) Alternative allele frequency in the 1000Gp3 European descendent samples.
##1000Gp3_SAS_AC=(from dbNSFP) Alternative allele counts in the 1000Gp3 South Asian descendent samples.
##1000Gp3_SAS_AF=(from dbNSFP) Alternative allele frequency in the 1000Gp3 South Asian descendent samples.
##29way_logOdds=29way_logOdds from dbNSFP file
##29way_logOdds_rankscore=29way_logOdds_rankscore from dbNSFP file
##29way_pi=29way_pi from dbNSFP file
##ALSPAC_AC=(from dbNSFP) Alternative allele count in called genotypes in UK10K ALSPAC cohort.
##ALSPAC_AF=(from dbNSFP) Alternative allele frequency in called genotypes in UK10K ALSPAC cohort.
##APPRIS=(from dbNSFP) APPRIS annotation for the transcripts matching Ensembl_transcriptid Multiple entries separated by ";". Potential values: principal1, principal2, principal3, principal4, principal5, alternative1, alternative2. See https://useast.ensembl.org/info/genome/genebuild/transcript_quality_tags.html
##Aloft_Confidence=(from dbNSFP) Confidence level of Aloft_pred; values can be "High Confidence" (p < 0.05) or "Low Confidence" (p > 0.05) multiple values separated by ";", corresponding to Ensembl_proteinid.
##Aloft_Fraction_transcripts_affected=(from dbNSFP) the fraction of the transcripts of the gene affected i.e. No. of transcripts affected by the SNP/Total no. of protein_coding transcripts for the gene multiple values separated by ";", corresponding to Ensembl_proteinid.
##Aloft_pred=(from dbNSFP) final classification predicted by ALoFT; values can be Tolerant, Recessive or Dominant multiple values separated by ";", corresponding to Ensembl_proteinid.
##Aloft_prob_Dominant=(from dbNSFP) Probability of the SNP being classified as dominant disease-causing by ALoFT multiple values separated by ";", corresponding to Ensembl_proteinid.
##Aloft_prob_Recessive=(from dbNSFP) Probability of the SNP being classified as recessive disease-causing by ALoFT multiple values separated by ";", corresponding to Ensembl_proteinid.
##Aloft_prob_Tolerant=(from dbNSFP) Probability of the SNP being classified as benign by ALoFT multiple values separated by ";", corresponding to Ensembl_proteinid.
##AltaiNeandertal=(from dbNSFP) genotype of a deep sequenced Altai Neanderthal
##Ancestral_allele=(from dbNSFP) ancestral allele based on 8 primates EPO. Ancestral alleles by Ensembl 84. The following comes from its original README file: ACTG - high-confidence call, ancestral state supported by the other two sequences actg - low-confidence call, ancestral state supported by one sequence only N - failure, the ancestral state is not supported by any other sequence - - the extant species contains an insertion at this position . - no coverage in the alignment
##CADD_phred=(from dbNSFP) CADD phred-like score. This is phred-like rank score based on whole genome CADD raw scores. Please refer to Kircher et al. (2014) Nature Genetics 46(3):310-5 for details. The larger the score the more likely the SNP has damaging effect. Please note the following copyright statement for CADD: "CADD scores (http://cadd.gs.washington.edu/) are Copyright 2013 University of Washington and Hudson-Alpha Institute for Biotechnology (all rights reserved) but are freely available for all academic, non-commercial applications. For commercial licensing information contact Jennifer McCullar (mccullaj@uw.edu)."
##CADD_raw=(from dbNSFP) CADD raw score for functional prediction of a SNP. Please refer to Kircher et al. (2014) Nature Genetics 46(3):310-5 for details. The larger the score the more likely the SNP has damaging effect. Scores range from -6.458163 to 18.301497 in dbNSFP. Please note the following copyright statement for CADD: "CADD scores (http://cadd.gs.washington.edu/) are Copyright 2013 University of Washington and Hudson-Alpha Institute for Biotechnology (all rights reserved) but are freely available for all academic, non-commercial applications. For commercial licensing information contact Jennifer McCullar (mccullaj@uw.edu)."
##CADD_raw_rankscore=(from dbNSFP) CADD raw scores were ranked among all CADD raw scores in dbNSFP. The rankscore is the ratio of the rank of the score over the total number of CADD raw scores in dbNSFP. Please note the following copyright statement for CADD: "CADD scores (http://cadd.gs.washington.edu/) are Copyright 2013 University of Washington and Hudson-Alpha Institute for Biotechnology (all rights reserved) but are freely available for all academic, non-commercial applications. For commercial licensing information contact Jennifer McCullar (mccullaj@uw.edu)."
##DANN_rankscore=(from dbNSFP) DANN scores were ranked among all DANN scores in dbNSFP. The rankscore is the ratio of the rank of the score over the total number of DANN scores in dbNSFP.
##DANN_score=(from dbNSFP) DANN is a functional prediction score retrained based on the training data of CADD using deep neural network. Scores range from 0 to 1. A larger number indicate a higher probability to be damaging. More information of this score can be found in doi: 10.1093/bioinformatics/btu703.
##DEOGEN2_pred=(from dbNSFP) Prediction of DEOGEN2 score based on the authors' recommendation, "T(olerated)" or "D(amaging)". The score cutoff between "D" and "T" is 0.5.
##DEOGEN2_rankscore=(from dbNSFP) DEOGEN2 scores were ranked among all DEOGEN2 scores in dbNSFP. The rankscore is the ratio of the rank of the score over the total number of DEOGEN2 scores in dbNSFP.
##DEOGEN2_score=(from dbNSFP) A deleteriousness prediction score "which incorporates heterogeneous information about the molecular effects of the variants, the domains involved, the relevance of the gene and the interactions in which it participates". It ranges from 0 to 1. The larger the score, the more likely the variant is deleterious. The authors suggest a threshold of 0.5 for separating damaging vs tolerant variants.
##Denisova=(from dbNSFP) genotype of a deep sequenced Denisova
##ESP6500_AA_AC=(from dbNSFP) Alternative allele count in the African American samples of the NHLBI GO Exome Sequencing Project (ESP6500 data set).
##ESP6500_AA_AF=(from dbNSFP) Alternative allele frequency in the African American samples of the NHLBI GO Exome Sequencing Project (ESP6500 data set).
##ESP6500_EA_AC=(from dbNSFP) Alternative allele count in the European American samples of the NHLBI GO Exome Sequencing Project (ESP6500 data set).
##ESP6500_EA_AF=(from dbNSFP) Alternative allele frequency in the European American samples of the NHLBI GO Exome Sequencing Project (ESP6500 data set).
##Eigen-PC-phred_coding=(from dbNSFP) Eigen PC score in phred scale.
##Eigen-PC-raw_coding=(from dbNSFP) Eigen PC score for genome-wide SNVs. A functional prediction score based on conservation, allele frequencies, deleteriousness prediction (for missense SNVs) and epigenomic signals (for synonymous and non-coding SNVs) using an unsupervised learning method (doi: 10.1038/ng.3477).
##Eigen-PC-raw_coding_rankscore=(from dbNSFP) Eigen-PC-raw scores were ranked among all Eigen-PC-raw scores in dbNSFP. The rankscore is the ratio of the rank of the score over the total number of Eigen-PC-raw scores in dbNSFP.
##Eigen-pred_coding=Eigen-pred_coding from dbNSFP file
##Eigen-raw_coding=(from dbNSFP) Eigen score for coding SNVs. A functional prediction score based on conservation, allele frequencies, and deleteriousness prediction using an unsupervised learning method (doi: 10.1038/ng.3477).
##Eigen-raw_coding_rankscore=(from dbNSFP) Eigen-raw scores were ranked among all Eigen-raw scores in dbNSFP. The rankscore is the ratio of the rank of the score over the total number of Eigen-raw scores in dbNSFP.
##Ensembl_geneid=(from dbNSFP) Ensembl gene id
##Ensembl_proteinid=(from dbNSFP) Ensembl protein ids Multiple entries separated by ";", corresponding to Ensembl_transcriptids
##Ensembl_transcriptid=(from dbNSFP) Ensembl transcript ids (Multiple entries separated by ";")
##ExAC_AC=(from dbNSFP) Allele count in total ExAC samples (60,706 samples)
##ExAC_AF=(from dbNSFP) Allele frequency in total ExAC samples
##ExAC_AFR_AC=(from dbNSFP) Adjusted Alt allele counts (DP >= 10 & GQ >= 20) in African & African American ExAC samples
##ExAC_AFR_AF=(from dbNSFP) Adjusted Alt allele frequency (DP >= 10 & GQ >= 20) in African & African American ExAC samples
##ExAC_AMR_AC=(from dbNSFP) Adjusted Alt allele counts (DP >= 10 & GQ >= 20) in American ExAC samples
##ExAC_AMR_AF=(from dbNSFP) Adjusted Alt allele frequency (DP >= 10 & GQ >= 20) in American ExAC samples
##ExAC_Adj_AC=(from dbNSFP) Adjusted Alt allele counts (DP >= 10 & GQ >= 20) in total ExAC samples
##ExAC_Adj_AF=(from dbNSFP) Adjusted Alt allele frequency (DP >= 10 & GQ >= 20) in total ExAC samples
##ExAC_EAS_AC=(from dbNSFP) Adjusted Alt allele counts (DP >= 10 & GQ >= 20) in East Asian ExAC samples
##ExAC_EAS_AF=(from dbNSFP) Adjusted Alt allele frequency (DP >= 10 & GQ >= 20) in East Asian ExAC samples
##ExAC_FIN_AC=(from dbNSFP) Adjusted Alt allele counts (DP >= 10 & GQ >= 20) in Finnish ExAC samples
##ExAC_FIN_AF=(from dbNSFP) Adjusted Alt allele frequency (DP >= 10 & GQ >= 20) in Finnish ExAC samples
##ExAC_NFE_AC=(from dbNSFP) Adjusted Alt allele counts (DP >= 10 & GQ >= 20) in Non-Finnish European ExAC samples
##ExAC_NFE_AF=(from dbNSFP) Adjusted Alt allele frequency (DP >= 10 & GQ >= 20) in Non-Finnish European ExAC samples
##ExAC_SAS_AC=(from dbNSFP) Adjusted Alt allele counts (DP >= 10 & GQ >= 20) in South Asian ExAC samples
##ExAC_SAS_AF=(from dbNSFP) Adjusted Alt allele frequency (DP >= 10 & GQ >= 20) in South Asian ExAC samples
##ExAC_nonTCGA_AC=(from dbNSFP) Allele count in total ExAC_nonTCGA samples (53,105 samples)
##ExAC_nonTCGA_AF=(from dbNSFP) Allele frequency in total ExAC_nonTCGA samples
##ExAC_nonTCGA_AFR_AC=(from dbNSFP) Adjusted Alt allele counts (DP >= 10 & GQ >= 20) in African & African American ExAC_nonTCGA samples
##ExAC_nonTCGA_AFR_AF=(from dbNSFP) Adjusted Alt allele frequency (DP >= 10 & GQ >= 20) in African & African American ExAC_nonTCGA samples
##ExAC_nonTCGA_AMR_AC=(from dbNSFP) Adjusted Alt allele counts (DP >= 10 & GQ >= 20) in American ExAC_nonTCGA samples
##ExAC_nonTCGA_AMR_AF=(from dbNSFP) Adjusted Alt allele frequency (DP >= 10 & GQ >= 20) in American ExAC_nonTCGA samples
##ExAC_nonTCGA_Adj_AC=(from dbNSFP) Adjusted Alt allele counts (DP >= 10 & GQ >= 20) in total ExAC_nonTCGA samples
##ExAC_nonTCGA_Adj_AF=(from dbNSFP) Adjusted Alt allele frequency (DP >= 10 & GQ >= 20) in total ExAC_nonTCGA samples
##ExAC_nonTCGA_EAS_AC=(from dbNSFP) Adjusted Alt allele counts (DP >= 10 & GQ >= 20) in East Asian ExAC_nonTCGA samples
##ExAC_nonTCGA_EAS_AF=(from dbNSFP) Adjusted Alt allele frequency (DP >= 10 & GQ >= 20) in East Asian ExAC_nonTCGA samples
##ExAC_nonTCGA_FIN_AC=(from dbNSFP) Adjusted Alt allele counts (DP >= 10 & GQ >= 20) in Finnish ExAC_nonTCGA samples
##ExAC_nonTCGA_FIN_AF=(from dbNSFP) Adjusted Alt allele frequency (DP >= 10 & GQ >= 20) in Finnish ExAC_nonTCGA samples
##ExAC_nonTCGA_NFE_AC=(from dbNSFP) Adjusted Alt allele counts (DP >= 10 & GQ >= 20) in Non-Finnish European ExAC_nonTCGA samples
##ExAC_nonTCGA_NFE_AF=(from dbNSFP) Adjusted Alt allele frequency (DP >= 10 & GQ >= 20) in Non-Finnish European ExAC_nonTCGA samples
##ExAC_nonTCGA_SAS_AC=(from dbNSFP) Adjusted Alt allele counts (DP >= 10 & GQ >= 20) in South Asian ExAC_nonTCGA samples
##ExAC_nonTCGA_SAS_AF=(from dbNSFP) Adjusted Alt allele frequency (DP >= 10 & GQ >= 20) in South Asian ExAC_nonTCGA samples
##ExAC_nonpsych_AC=(from dbNSFP) Allele count in total ExAC_nonpsych samples (45,376 samples)
##ExAC_nonpsych_AF=(from dbNSFP) Allele frequency in total ExAC_nonpsych samples
##ExAC_nonpsych_AFR_AC=(from dbNSFP) Adjusted Alt allele counts (DP >= 10 & GQ >= 20) in African & African American ExAC_nonpsych samples
##ExAC_nonpsych_AFR_AF=(from dbNSFP) Adjusted Alt allele frequency (DP >= 10 & GQ >= 20) in African & African American ExAC_nonpsych samples
##ExAC_nonpsych_AMR_AC=(from dbNSFP) Adjusted Alt allele counts (DP >= 10 & GQ >= 20) in American ExAC_nonpsych samples
##ExAC_nonpsych_AMR_AF=(from dbNSFP) Adjusted Alt allele frequency (DP >= 10 & GQ >= 20) in American ExAC_nonpsych samples
##ExAC_nonpsych_Adj_AC=(from dbNSFP) Adjusted Alt allele counts (DP >= 10 & GQ >= 20) in total ExAC_nonpsych samples
##ExAC_nonpsych_Adj_AF=(from dbNSFP) Adjusted Alt allele frequency (DP >= 10 & GQ >= 20) in total ExAC_nonpsych samples
##ExAC_nonpsych_EAS_AC=(from dbNSFP) Adjusted Alt allele counts (DP >= 10 & GQ >= 20) in East Asian ExAC_nonpsych samples
##ExAC_nonpsych_EAS_AF=(from dbNSFP) Adjusted Alt allele frequency (DP >= 10 & GQ >= 20) in East Asian ExAC_nonpsych samples
##ExAC_nonpsych_FIN_AC=(from dbNSFP) Adjusted Alt allele counts (DP >= 10 & GQ >= 20) in Finnish ExAC_nonpsych samples
##ExAC_nonpsych_FIN_AF=(from dbNSFP) Adjusted Alt allele frequency (DP >= 10 & GQ >= 20) in Finnish ExAC_nonpsych samples
##ExAC_nonpsych_NFE_AC=(from dbNSFP) Adjusted Alt allele counts (DP >= 10 & GQ >= 20) in Non-Finnish European ExAC_nonpsych samples
##ExAC_nonpsych_NFE_AF=(from dbNSFP) Adjusted Alt allele frequency (DP >= 10 & GQ >= 20) in Non-Finnish European ExAC_nonpsych samples
##ExAC_nonpsych_SAS_AC=(from dbNSFP) Adjusted Alt allele counts (DP >= 10 & GQ >= 20) in South Asian ExAC_nonpsych samples
##ExAC_nonpsych_SAS_AF=(from dbNSFP) Adjusted Alt allele frequency (DP >= 10 & GQ >= 20) in South Asian ExAC_nonpsych samples
##FATHMM_converted_rankscore=(from dbNSFP) FATHMMori scores were first converted to FATHMMnew=1-(FATHMMori+16.13)/26.77, then ranked among all FATHMMnew scores in dbNSFP. The rankscore is the ratio of the rank of the score over the total number of FATHMMnew scores in dbNSFP. If there are multiple scores, only the most damaging (largest) rankscore is presented. The scores range from 0 to 1.
##FATHMM_pred=(from dbNSFP) If a FATHMMori score is <=-1.5 (or rankscore >=0.81332) the corresponding nsSNV is predicted as "D(AMAGING)"; otherwise it is predicted as "T(OLERATED)". Multiple predictions separated by ";", corresponding to Ensembl_proteinid.
##FATHMM_score=(from dbNSFP) FATHMM default score (weighted for human inherited-disease mutations with Disease Ontology) (FATHMMori). Scores range from -16.13 to 10.64. The smaller the score the more likely the SNP has damaging effect. Multiple scores separated by ";", corresponding to Ensembl_proteinid.
##GENCODE_basic=(from dbNSFP) Whether the transcript belongs to GENCODE_basic (5' and 3' complete transcripts). Multiple entries separated by ";", matching Ensembl_transcriptid. See https://useast.ensembl.org/info/genome/genebuild/transcript_quality_tags.html
##GERP++_NR=(from dbNSFP) GERP++ neutral rate
##GERP++_RS=(from dbNSFP) GERP++ RS score, the larger the score, the more conserved the site. Scores range from -12.3 to 6.17.
##GERP++_RS_rankscore=(from dbNSFP) GERP++ RS scores were ranked among all GERP++ RS scores in dbNSFP. The rankscore is the ratio of the rank of the score over the total number of GERP++ RS scores in dbNSFP.
##GM12878_confidence_value=(from dbNSFP) 0 - highly significant scores (approx. p<.003); 1 - significant scores (approx. p<.05); 2 - informative scores (approx. p<.25); 3 - other scores (approx. p>=.25).
##GM12878_fitCons_rankscore=(from dbNSFP) GM12878 fitCons scores were ranked among all GM12878 fitCons scores in dbNSFP. The rankscore is the ratio of the rank of the score over the total number of GM12878 fitCons scores in dbNSFP.
##GM12878_fitCons_score=(from dbNSFP) fitCons score predicts the fraction of genomic positions belonging to a specific function class (defined by epigenomic "fingerprint") that are under selective pressure. Scores range from 0 to 1, with a larger score indicating a higher proportion of nucleic sites of the functional class the genomic position belong to are under selective pressure, therefore more likely to be functional important. GM12878 fitCons scores are based on cell type GM12878. More details can be found in doi:10.1038/ng.3196.
##GTEx_V7_gene=(from dbNSFP) target gene of the (significant) eQTL SNP
##GTEx_V7_tissue=(from dbNSFP) tissue type of the expression data with which the eQTL/gene pair is detected
##GenoCanyon_rankscore=GenoCanyon_rankscore from dbNSFP file
##GenoCanyon_score=(from dbNSFP) A functional prediction score based on conservation and biochemical annotations using an unsupervised statistical learning. (doi:10.1038/srep10576)
##Geuvadis_eQTL_target_gene=(from dbNSFP) Ensembl gene ID of the eQTL associated with, from the Geuvadis project
##H1-hESC_confidence_value=(from dbNSFP) 0 - highly significant scores (approx. p<.003); 1 - significant scores (approx. p<.05); 2 - informative scores (approx. p<.25); 3 - other scores (approx. p>=.25).
##H1-hESC_fitCons_rankscore=(from dbNSFP) H1-hESC fitCons scores were ranked among all H1-hESC fitCons scores in dbNSFP. The rankscore is the ratio of the rank of the score over the total number of H1-hESC fitCons scores in dbNSFP.
##H1-hESC_fitCons_score=(from dbNSFP) fitCons score predicts the fraction of genomic positions belonging to a specific function class (defined by epigenomic "fingerprint") that are under selective pressure. Scores range from 0 to 1, with a larger score indicating a higher proportion of nucleic sites of the functional class the genomic position belong to are under selective pressure, therefore more likely to be functional important. GM12878 fitCons scores are based on cell type H1-hESC. More details can be found in doi:10.1038/ng.3196.
##HUVEC_confidence_value=(from dbNSFP) 0 - highly significant scores (approx. p<.003); 1 - significant scores (approx. p<.05); 2 - informative scores (approx. p<.25); 3 - other scores (approx. p>=.25).
##HUVEC_fitCons_rankscore=(from dbNSFP) HUVEC fitCons scores were ranked among all HUVEC fitCons scores in dbNSFP. The rankscore is the ratio of the rank of the score over the total number of HUVEC fitCons scores in dbNSFP.
##HUVEC_fitCons_score=(from dbNSFP) fitCons score predicts the fraction of genomic positions belonging to a specific function class (defined by epigenomic "fingerprint") that are under selective pressure. Scores range from 0 to 1, with a larger score indicating a higher proportion of nucleic sites of the functional class the genomic position belong to are under selective pressure, therefore more likely to be functional important. GM12878 fitCons scores are based on cell type HUVEC. More details can be found in doi:10.1038/ng.3196.
##Interpro_domain=(from dbNSFP) domain or conserved site on which the variant locates. Domain annotations come from Interpro database. The number in the brackets following a specific domain is the count of times Interpro assigns the variant position to that domain, typically coming from different predicting databases. Multiple entries separated by ";".
##LINSIGHT=(from dbNSFP) "The LINSIGHT score measures the probability of negative selection on noncoding sites" Details refer to doi:10.1038/ng.3810.
##LINSIGHT_rankscore=(from dbNSFP) LINSIGHT scores were ranked among all LINSIGHT scores in dbNSFP. The rankscore is the ratio of the rank of the score over the total number of LINSIGHT scores in dbNSFP.
##LRT_Omega=(from dbNSFP) estimated nonsynonymous-to-synonymous-rate ratio (Omega, reported by LRT)
##LRT_converted_rankscore=(from dbNSFP) LRTori scores were first converted as LRTnew=1-LRTori*0.5 if Omega<1, or LRTnew=LRTori*0.5 if Omega>=1. Then LRTnew scores were ranked among all LRTnew scores in dbNSFP. The rankscore is the ratio of the rank over the total number of the scores in dbNSFP. The scores range from 0.00162 to 0.8433.
##LRT_pred=(from dbNSFP) LRT prediction, D(eleterious), N(eutral) or U(nknown), which is not solely determined by the score.
##LRT_score=(from dbNSFP) The original LRT two-sided p-value (LRTori), ranges from 0 to 1.
##M-CAP_pred=(from dbNSFP) Prediction of M-CAP score based on the authors' recommendation, "T(olerated)" or "D(amaging)". The score cutoff between "D" and "T" is 0.025.
##M-CAP_rankscore=(from dbNSFP) M-CAP scores were ranked among all M-CAP scores in dbNSFP. The rankscore is the ratio of the rank of the score over the total number of M-CAP scores in dbNSFP.
##M-CAP_score=(from dbNSFP) M-CAP score (details in DOI: 10.1038/ng.3703). Scores range from 0 to 1. The larger the score the more likely the SNP has damaging effect.
##MPC_rankscore=(from dbNSFP) MPC scores were ranked among all MPC scores in dbNSFP. The rankscore is the ratio of the rank of the score over the total number of MPC scores in dbNSFP.
##MPC_score=(from dbNSFP) A deleteriousness prediction score for missense variants based on regional missense constraint. The range of MPC score is 0 to 5. The larger the score, the more likely the variant is pathogenic. Details see doi: http://dx.doi.org/10.1101/148353. Multiple entries are separated by ";", corresponding to Ensembl_transcriptid.
##MVP_rankscore=(from dbNSFP) MVP scores were ranked among all MVP scores in dbNSFP. The rankscore is the ratio of the rank of the score over the total number of MVP scores in dbNSFP.
##MVP_score=(from dbNSFP) A pathogenicity prediction score for missense variants using deep learning approach. The range of MVP score is from 0 to 1. The larger the score, the more likely the variant is pathogenic. The authors suggest thresholds of 0.7 and 0.75 for separating damaging vs tolerant variants in constrained genes (ExAC pLI >=0.5) and non-constrained genes (ExAC pLI<0.5), respectively. Details see doi: http://dx.doi.org/10.1101/259390 Multiple entries are separated by ";", corresponding to Ensembl_transcriptid.
##MetaLR_pred=(from dbNSFP) Prediction of our MetaLR based ensemble prediction score,"T(olerated)" or "D(amaging)". The score cutoff between "D" and "T" is 0.5. The rankscore cutoff between "D" and "T" is 0.81101.
##MetaLR_rankscore=(from dbNSFP) MetaLR scores were ranked among all MetaLR scores in dbNSFP. The rankscore is the ratio of the rank of the score over the total number of MetaLR scores in dbNSFP. The scores range from 0 to 1.
##MetaLR_score=(from dbNSFP) Our logistic regression (LR) based ensemble prediction score, which incorporated 10 scores (SIFT, PolyPhen-2 HDIV, PolyPhen-2 HVAR, GERP++, MutationTaster, Mutation Assessor, FATHMM, LRT, SiPhy, PhyloP) and the maximum frequency observed in the 1000 genomes populations. Larger value means the SNV is more likely to be damaging. Scores range from 0 to 1.
##MetaSVM_pred=(from dbNSFP) Prediction of our SVM based ensemble prediction score,"T(olerated)" or "D(amaging)". The score cutoff between "D" and "T" is 0. The rankscore cutoff between "D" and "T" is 0.82257.
##MetaSVM_rankscore=(from dbNSFP) MetaSVM scores were ranked among all MetaSVM scores in dbNSFP. The rankscore is the ratio of the rank of the score over the total number of MetaSVM scores in dbNSFP. The scores range from 0 to 1.
##MetaSVM_score=(from dbNSFP) Our support vector machine (SVM) based ensemble prediction score, which incorporated 10 scores (SIFT, PolyPhen-2 HDIV, PolyPhen-2 HVAR, GERP++, MutationTaster, Mutation Assessor, FATHMM, LRT, SiPhy, PhyloP) and the maximum frequency observed in the 1000 genomes populations. Larger value means the SNV is more likely to be damaging. Scores range from -2 to 3 in dbNSFP.
##MutPred_AAchange=(from dbNSFP) Amino acid change used for MutPred_score calculation.
##MutPred_Top5features=(from dbNSFP) Top 5 features (molecular mechanisms of disease) as predicted by MutPred with p values. MutPred_score > 0.5 and p < 0.05 are referred to as actionable hypotheses. MutPred_score > 0.75 and p < 0.05 are referred to as confident hypotheses. MutPred_score > 0.75 and p < 0.01 are referred to as very confident hypotheses.
##MutPred_protID=(from dbNSFP) UniProt accession or Ensembl transcript ID used for MutPred_score calculation.
##MutPred_rankscore=(from dbNSFP) MutPred scores were ranked among all MutPred scores in dbNSFP. The rankscore is the ratio of the rank of the score over the total number of MutPred scores in dbNSFP.
##MutPred_score=(from dbNSFP) General MutPred score. Scores range from 0 to 1. The larger the score the more likely the SNP has damaging effect.
##MutationAssessor_pred=(from dbNSFP) MutationAssessor's functional impact of a variant - predicted functional, i.e. high ("H") or medium ("M"), or predicted non-functional, i.e. low ("L") or neutral ("N"). The MAori score cutoffs between "H" and "M", "M" and "L", and "L" and "N", are 3.5, 1.935 and 0.8, respectively. The rankscore cutoffs between "H" and "M", "M" and "L", and "L" and "N", are 0.9307, 0.52043 and 0.19675, respectively.
##MutationAssessor_rankscore=(from dbNSFP) MAori scores were ranked among all MAori scores in dbNSFP. The rankscore is the ratio of the rank of the score over the total number of MAori scores in dbNSFP. The scores range from 0 to 1.
##MutationAssessor_score=(from dbNSFP) MutationAssessor functional impact combined score (MAori). The score ranges from -5.17 to 6.49 in dbNSFP. Multiple entries are separated by ";", corresponding to Uniprot_entry.
##MutationTaster_AAE=(from dbNSFP) MutationTaster predicted amino acid change.
##MutationTaster_converted_rankscore=(from dbNSFP) The MTori scores were first converted. If the prediction is "A" or "D" MTnew=MTori; if the prediction is "N" or "P", MTnew=1-MTori. Then MTnew scores were ranked among all MTnew scores in dbNSFP. If there are multiple scores of a SNV, only the largest MTnew was used in ranking. The rankscore is the ratio of the rank of the score over the total number of MTnew scores in dbNSFP. The scores range from 0.08979 to 0.81001.
##MutationTaster_model=(from dbNSFP) MutationTaster prediction models.
##MutationTaster_pred=(from dbNSFP) MutationTaster prediction, "A" ("disease_causing_automatic"), "D" ("disease_causing"), "N" ("polymorphism") or "P" ("polymorphism_automatic"). The score cutoff between "D" and "N" is 0.5 for MTnew and 0.31733 for the rankscore.
##MutationTaster_score=(from dbNSFP) MutationTaster p-value (MTori), ranges from 0 to 1. Multiple scores are separated by ";". Information on corresponding transcript(s) can be found by querying http://www.mutationtaster.org/ChrPos.html
##PROVEAN_converted_rankscore=(from dbNSFP) PROVEANori were first converted to PROVEANnew=1-(PROVEANori+14)/28, then ranked among all PROVEANnew scores in dbNSFP. The rankscore is the ratio of the rank the PROVEANnew score over the total number of PROVEANnew scores in dbNSFP. If there are multiple scores, only the most damaging (largest) rankscore is presented. The scores range from 0 to 1.
##PROVEAN_pred=(from dbNSFP) If PROVEANori <= -2.5 (rankscore>=0.54382) the corresponding nsSNV is predicted as "D(amaging)"; otherwise it is predicted as "N(eutral)". Multiple predictions separated by ";", corresponding to Ensembl_proteinid.
##PROVEAN_score=(from dbNSFP) PROVEAN score (PROVEANori). Scores range from -14 to 14. The smaller the score the more likely the SNP has damaging effect. Multiple scores separated by ";", corresponding to Ensembl_proteinid.
##Polyphen2_HDIV_pred=(from dbNSFP) Polyphen2 prediction based on HumDiv, "D" ("probably damaging", HDIV score in [0.957,1] or rankscore in [0.55859,0.91137]), "P" ("possibly damaging", HDIV score in [0.454,0.956] or rankscore in [0.37043,0.55681]) and "B" ("benign", HDIV score in [0,0.452] or rankscore in [0.03061,0.36974]). Score cutoff for binary classification is 0.5 for HDIV score or 0.38028 for rankscore, i.e. the prediction is "neutral" if the HDIV score is smaller than 0.5 (rankscore is smaller than 0.38028), and "deleterious" if the HDIV score is larger than 0.5 (rankscore is larger than 0.38028). Multiple entries are separated by ";", corresponding to Uniprot_acc.
##Polyphen2_HDIV_rankscore=(from dbNSFP) Polyphen2 HDIV scores were first ranked among all HDIV scores in dbNSFP. The rankscore is the ratio of the rank the score over the total number of the scores in dbNSFP. If there are multiple scores, only the most damaging (largest) rankscore is presented. The scores range from 0.03061 to 0.91137.
##Polyphen2_HDIV_score=(from dbNSFP) Polyphen2 score based on HumDiv, i.e. hdiv_prob. The score ranges from 0 to 1. Multiple entries separated by ";", corresponding to Uniprot_acc.
##Polyphen2_HVAR_pred=(from dbNSFP) Polyphen2 prediction based on HumVar, "D" ("probably damaging", HVAR score in [0.909,1] or rankscore in [0.65694,0.97581]), "P" ("possibly damaging", HVAR in [0.447,0.908] or rankscore in [0.47121,0.65622]) and "B" ("benign", HVAR score in [0,0.446] or rankscore in [0.01493,0.47076]). Score cutoff for binary classification is 0.5 for HVAR score or 0.48762 for rankscore, i.e. the prediction is "neutral" if the HVAR score is smaller than 0.5 (rankscore is smaller than 0.48762), and "deleterious" if the HVAR score is larger than 0.5 (rankscore is larger than 0.48762). Multiple entries are separated by ";", corresponding to Uniprot_acc.
##Polyphen2_HVAR_rankscore=(from dbNSFP) Polyphen2 HVAR scores were first ranked among all HVAR scores in dbNSFP. The rankscore is the ratio of the rank the score over the total number of the scores in dbNSFP. If there are multiple scores, only the most damaging (largest) rankscore is presented. The scores range from 0.01493 to 0.97581.
##Polyphen2_HVAR_score=(from dbNSFP) Polyphen2 score based on HumVar, i.e. hvar_prob. The score ranges from 0 to 1. Multiple entries separated by ";", corresponding to Uniprot_acc.
##PrimateAI_pred=(from dbNSFP) Prediction of PrimateAI score based on the authors' recommendation, "T(olerated)" or "D(amaging)". The score cutoff between "D" and "T" is 0.803.
##PrimateAI_rankscore=(from dbNSFP) PrimateAI scores were ranked among all PrimateAI scores in dbNSFP. The rankscore is the ratio of the rank of the score over the total number of PrimateAI scores in dbNSFP.
##PrimateAI_score=(from dbNSFP) A pathogenicity prediction score for missense variants based on common variants of non-human primate species using a deep neural network. The range of PrimateAI score is 0 to 1. The larger the score, the more likely the variant is pathogenic. The authors suggest a threshold of 0.803 for separating damaging vs tolerant variants. Details see https://doi.org/10.1038/s41588-018-0167-z
##REVEL_rankscore=(from dbNSFP) REVEL scores were ranked among all REVEL scores in dbNSFP. The rankscore is the ratio of the rank of the score over the total number of REVEL scores in dbNSFP.
##REVEL_score=(from dbNSFP) REVEL is an ensemble score based on 13 individual scores for predicting the pathogenicity of missense variants. Scores range from 0 to 1. The larger the score the more likely the SNP has damaging effect. "REVEL scores are freely available for non-commercial use. For other uses, please contact Weiva Sieh" (weiva.sieh@mssm.edu)
##Reliability_index=(from dbNSFP) Number of observed component scores (except the maximum frequency in the 1000 genomes populations) for MetaSVM and MetaLR. Ranges from 1 to 10. As MetaSVM and MetaLR scores are calculated based on imputed data, the less missing component scores, the higher the reliability of the scores and predictions.
##SIFT4G_converted_rankscore=(from dbNSFP) SIFT4G scores were first converted to SIFT4Gnew=1-SIFT4G, then ranked among all SIFT4Gnew scores in dbNSFP. The rankscore is the ratio of the rank the SIFT4Gnew score over the total number of SIFT4Gnew scores in dbNSFP. If there are multiple scores, only the most damaging (largest) rankscore is presented.
##SIFT4G_pred=(from dbNSFP) If SIFT4G is < 0.05 the corresponding nsSNV is predicted as "D(amaging)"; otherwise it is predicted as "T(olerated)". Multiple scores separated by ",", corresponding to Ensembl_transcriptid
##SIFT4G_score=(from dbNSFP) SIFT 4G score (SIFT4G). Scores range from 0 to 1. The smaller the score the more likely the SNP has damaging effect. Multiple scores separated by ",", corresponding to Ensembl_transcriptid
##SIFT_converted_rankscore=(from dbNSFP) SIFTori scores were first converted to SIFTnew=1-SIFTori, then ranked among all SIFTnew scores in dbNSFP. The rankscore is the ratio of the rank the SIFTnew score over the total number of SIFTnew scores in dbNSFP. If there are multiple scores, only the most damaging (largest) rankscore is presented. The rankscores range from 0.00964 to 0.91255.
##SIFT_pred=(from dbNSFP) If SIFTori is smaller than 0.05 (rankscore>0.39575) the corresponding nsSNV is predicted as "D(amaging)"; otherwise it is predicted as "T(olerated)". Multiple predictions separated by ";"
##SIFT_score=(from dbNSFP) SIFT score (SIFTori). Scores range from 0 to 1. The smaller the score the more likely the SNP has damaging effect. Multiple scores separated by ";", corresponding to Ensembl_proteinid.
##TSL=(from dbNSFP) Transcript Support Level. Multiple entries separated by ";", matching Ensembl_transcriptid. Potential values: 1 to 5, NA. See https://useast.ensembl.org/info/genome/genebuild/transcript_quality_tags.html
##TWINSUK_AC=(from dbNSFP) Alternative allele count in called genotypes in UK10K TWINSUK cohort.
##TWINSUK_AF=(from dbNSFP) Alternative allele frequency in called genotypes in UK10K TWINSUK cohort.
##UK10K_AC=(from dbNSFP) Alternative allele count in combined genotypes in UK10K cohort (TWINSUK+ALSPAC).
##UK10K_AF=(from dbNSFP) Alternative allele frequency in combined genotypes in UK10K cohort (TWINSUK+ALSPAC).
##Uniprot_acc=(from dbNSFP) Uniprot accession number matching the Ensembl_proteinid Multiple entries separated by ";".
##Uniprot_entry=(from dbNSFP) Uniprot entry ID matching the Ensembl_proteinid Multiple entries separated by ";".
##VEP_canonical=(from dbNSFP) canonical transcript used in Ensembl. Multiple entries separated by ";", matching Ensembl_transcriptid. See https://useast.ensembl.org/Help/Glossary?id=521
##VEST4_rankscore=(from dbNSFP) VEST4 scores were ranked among all VEST4 scores in dbNSFP. The rankscore is the ratio of the rank of the score over the total number of VEST4 scores in dbNSFP. In case there are multiple scores for the same variant, the largest score (most damaging) is presented. The scores range from 0 to 1. Please note VEST score is free for non-commercial use. For more details please refer to http://wiki.chasmsoftware.org/index.php/SoftwareLicense. Commercial users should contact the Johns Hopkins Technology Transfer office.
##VEST4_score=(from dbNSFP) VEST 4.0 score. Score ranges from 0 to 1. The larger the score the more likely the mutation may cause functional change. Multiple scores separated by ";", corresponding to Ensembl_transcriptid. Please note this score is free for non-commercial use. For more details please refer to http://wiki.chasmsoftware.org/index.php/SoftwareLicense. Commercial users should contact the Johns Hopkins Technology Transfer office.
##VindijiaNeandertal=(from dbNSFP) genotype of a deep sequenced Vindijia Neandertal
##aaalt=(from dbNSFP) alternative amino acid "." if the variant is a splicing site SNP (2bp on each end of an intron)
##aapos=(from dbNSFP) amino acid position as to the protein. "-1" if the variant is a splicing site SNP (2bp on each end of an intron). Multiple entries separated by ";", corresponding to Ensembl_proteinid
##aaref=(from dbNSFP) reference amino acid "." if the variant is a splicing site SNP (2bp on each end of an intron)
##alt=(from dbNSFP) alternative nucleotide allele (as on the + strand)
##bStatistic=(from dbNSFP) Background selection (B) value estimates from doi.org/10.1371/journal.pgen.1000471. Ranges from 0 to 1000. It estimates the expected fraction (*1000) of neutral diversity present at a site. Values close to 0 represent near complete removal of diversity as a result of background selection and values near 1000 indicating absent of background selection. Data from CADD v1.4.
##bStatistic_rankscore=(from dbNSFP) bStatistic scores were ranked among all bStatistic scores in dbNSFP. The rankscore is the ratio of the rank of the score over the total number of bStatistic scores in dbNSFP.
##cds_strand=(from dbNSFP) coding sequence (CDS) strand (+ or -)
##clinvar_clnsig=(from dbNSFP) clinical significance by clinvar Possible values: Benign, Likely_benign, Likely_pathogenic, Pathogenic, drug_response, histocompatibility. A negative score means the score is for the ref allele
##clinvar_hgvs=(from dbNSFP) variant in HGVS format
##clinvar_review=(from dbNSFP) ClinVar Review Status summary Possible values: no assertion criteria provided, criteria provided, single submitter, criteria provided, multiple submitters, no conflicts, reviewed by expert panel, practice guideline
##clinvar_rs=(from dbNSFP) rs number by clinvar
##clinvar_trait=(from dbNSFP) the trait/disease the clinvar_clnsig referring to
##clinvar_var_source=(from dbNSFP) source of the variant
##codon_degeneracy=(from dbNSFP) degenerate type (0, 2 or 3)
##codonpos=(from dbNSFP) position on the codon (1, 2 or 3)
##fathmm-MKL_coding_group=(from dbNSFP) the groups of features (labeled A-J) used to obtained the score. More details can be found in doi: 10.1093/bioinformatics/btv009.
##fathmm-MKL_coding_pred=(from dbNSFP) If a fathmm-MKL_coding_score is >0.5 (or rankscore >0.28317) the corresponding nsSNV is predicted as "D(AMAGING)"; otherwise it is predicted as "N(EUTRAL)".
##fathmm-MKL_coding_rankscore=(from dbNSFP) fathmm-MKL coding scores were ranked among all fathmm-MKL coding scores in dbNSFP. The rankscore is the ratio of the rank of the score over the total number of fathmm-MKL coding scores in dbNSFP.
##fathmm-MKL_coding_score=(from dbNSFP) fathmm-MKL p-values. Scores range from 0 to 1. SNVs with scores >0.5 are predicted to be deleterious, and those <0.5 are predicted to be neutral or benign. Scores close to 0 or 1 are with the highest-confidence. Coding scores are trained using 10 groups of features. More details of the score can be found in doi: 10.1093/bioinformatics/btv009.
##fathmm-XF_coding_pred=(from dbNSFP) If a fathmm-XF_coding_score is >0.5, the corresponding nsSNV is predicted as "D(AMAGING)"; otherwise it is predicted as "N(EUTRAL)".
##fathmm-XF_coding_rankscore=(from dbNSFP) fathmm-XF coding scores were ranked among all fathmm-XF coding scores in dbNSFP. The rankscore is the ratio of the rank of the score over the total number of fathmm-XF coding scores in dbNSFP.
##fathmm-XF_coding_score=(from dbNSFP) fathmm-XF p-values. Scores range from 0 to 1. SNVs with scores >0.5 are predicted to be deleterious, and those <0.5 are predicted to be neutral or benign. Scores close to 0 or 1 are with the highest-confidence. Coding scores are trained using 10 groups of features. More details of the score can be found in doi: 10.1093/bioinformatics/btx536.
##genename=(from dbNSFP) gene name; if the nsSNV can be assigned to multiple genes, gene names are separated by ";"
##gnomAD_exomes_AC=(from dbNSFP) Alternative allele count in the whole gnomAD exome samples (125,748 samples)
##gnomAD_exomes_AF=(from dbNSFP) Alternative allele frequency in the whole gnomAD exome samples (125,748 samples)
##gnomAD_exomes_AFR_AC=(from dbNSFP) Alternative allele count in the African/African American gnomAD exome samples (8,128 samples)
##gnomAD_exomes_AFR_AF=(from dbNSFP) Alternative allele frequency in the African/African American gnomAD exome samples (8,128 samples)
##gnomAD_exomes_AFR_AN=(from dbNSFP) Total allele count in the African/African American gnomAD exome samples (8,128 samples)
##gnomAD_exomes_AFR_nhomalt=(from dbNSFP) Count of individuals with homozygous alternative allele in the African/African American gnomAD exome samples (8,128 samples)
##gnomAD_exomes_AMR_AC=(from dbNSFP) Alternative allele count in the Latino gnomAD exome samples (17,296 samples)
##gnomAD_exomes_AMR_AF=(from dbNSFP) Alternative allele frequency in the Latino gnomAD exome samples (17,296 samples)
##gnomAD_exomes_AMR_AN=(from dbNSFP) Total allele count in the Latino gnomAD exome samples (17,296 samples)
##gnomAD_exomes_AMR_nhomalt=(from dbNSFP) Count of individuals with homozygous alternative allele in the Latino gnomAD exome samples (17,296 samples)
##gnomAD_exomes_AN=(from dbNSFP) Total allele count in the whole gnomAD exome samples (125,748 samples)
##gnomAD_exomes_ASJ_AC=(from dbNSFP) Alternative allele count in the Ashkenazi Jewish gnomAD exome samples (5,040 samples)
##gnomAD_exomes_ASJ_AF=(from dbNSFP) Alternative allele frequency in the Ashkenazi Jewish gnomAD exome samples (5,040 samples)
##gnomAD_exomes_ASJ_AN=(from dbNSFP) Total allele count in the Ashkenazi Jewish gnomAD exome samples (5,040 samples)
##gnomAD_exomes_ASJ_nhomalt=(from dbNSFP) Count of individuals with homozygous alternative allele in the Ashkenazi Jewish gnomAD exome samples (5,040 samples)
##gnomAD_exomes_EAS_AC=(from dbNSFP) Alternative allele count in the East Asian gnomAD exome samples (9,197 samples)
##gnomAD_exomes_EAS_AF=(from dbNSFP) Alternative allele frequency in the East Asian gnomAD exome samples (9,197 samples)
##gnomAD_exomes_EAS_AN=(from dbNSFP) Total allele count in the East Asian gnomAD exome samples (9,197 samples)
##gnomAD_exomes_EAS_nhomalt=(from dbNSFP) Count of individuals with homozygous alternative allele in the East Asian gnomAD exome samples (9,197 samples)
##gnomAD_exomes_FIN_AC=(from dbNSFP) Alternative allele count in the Finnish gnomAD exome samples (10,824 samples)
##gnomAD_exomes_FIN_AF=(from dbNSFP) Alternative allele frequency in the Finnish gnomAD exome samples (10,824 samples)
##gnomAD_exomes_FIN_AN=(from dbNSFP) Total allele count in the Finnish gnomAD exome samples (10,824 samples)
##gnomAD_exomes_FIN_nhomalt=(from dbNSFP) Count of individuals with homozygous alternative allele in the Finnish gnomAD exome samples (10,824 samples)
##gnomAD_exomes_NFE_AC=(from dbNSFP) Alternative allele count in the Non-Finnish European gnomAD exome samples (56,885 samples)
##gnomAD_exomes_NFE_AF=(from dbNSFP) Alternative allele frequency in the Non-Finnish European gnomAD exome samples (56,885 samples)
##gnomAD_exomes_NFE_AN=(from dbNSFP) Total allele count in the Non-Finnish European gnomAD exome samples (56,885 samples)
##gnomAD_exomes_NFE_nhomalt=(from dbNSFP) Count of individuals with homozygous alternative allele in the Non-Finnish European gnomAD exome samples (56,885 samples)
##gnomAD_exomes_POPMAX_AC=(from dbNSFP) Allele count in the population with the maximum AF
##gnomAD_exomes_POPMAX_AF=(from dbNSFP) Maximum allele frequency across populations (excluding samples of Ashkenazi, Finnish, and indeterminate ancestry)
##gnomAD_exomes_POPMAX_AN=(from dbNSFP) Total number of alleles in the population with the maximum AF
##gnomAD_exomes_POPMAX_nhomalt=(from dbNSFP) Count of homozygous individuals in the population with the maximum allele frequency
##gnomAD_exomes_SAS_AC=(from dbNSFP) Alternative allele count in the South Asian gnomAD exome samples (15,308 samples)
##gnomAD_exomes_SAS_AF=(from dbNSFP) Alternative allele frequency in the South Asian gnomAD exome samples (15,308 samples)
##gnomAD_exomes_SAS_AN=(from dbNSFP) Total allele count in the South Asian gnomAD exome samples (15,308 samples)
##gnomAD_exomes_SAS_nhomalt=(from dbNSFP) Count of individuals with homozygous alternative allele in the South Asian gnomAD exome samples (15,308 samples)
##gnomAD_exomes_controls_AC=(from dbNSFP) Alternative allele count in the controls subset of whole gnomAD exome samples (54,704 samples)
##gnomAD_exomes_controls_AF=(from dbNSFP) Alternative allele frequency in the controls subset of whole gnomAD exome samples (54,704 samples)
##gnomAD_exomes_controls_AFR_AC=(from dbNSFP) Alternative allele count in the controls subset of African/African American gnomAD exome samples (3,582 samples)
##gnomAD_exomes_controls_AFR_AF=(from dbNSFP) Alternative allele frequency in the controls subset of African/African American gnomAD exome samples (3,582 samples)
##gnomAD_exomes_controls_AFR_AN=(from dbNSFP) Total allele count in the controls subset of African/African American gnomAD exome samples (3,582 samples)
##gnomAD_exomes_controls_AFR_nhomalt=(from dbNSFP) Count of individuals with homozygous alternative allele in the controls subset of African/African American gnomAD exome samples (3,582 samples)
##gnomAD_exomes_controls_AMR_AC=(from dbNSFP) Alternative allele count in the controls subset of Latino gnomAD exome samples (8,556 samples)
##gnomAD_exomes_controls_AMR_AF=(from dbNSFP) Alternative allele frequency in the controls subset of Latino gnomAD exome samples (8,556 samples)
##gnomAD_exomes_controls_AMR_AN=(from dbNSFP) Total allele count in the controls subset of Latino gnomAD exome samples (8,556 samples)
##gnomAD_exomes_controls_AMR_nhomalt=(from dbNSFP) Count of individuals with homozygous alternative allele in the controls subset of Latino gnomAD exome samples (8,556 samples)
##gnomAD_exomes_controls_AN=(from dbNSFP) Total allele count in the controls subset of whole gnomAD exome samples (54,704 samples)
##gnomAD_exomes_controls_ASJ_AC=(from dbNSFP) Alternative allele count in the controls subset of Ashkenazi Jewish gnomAD exome samples (1,160 samples)
##gnomAD_exomes_controls_ASJ_AF=(from dbNSFP) Alternative allele frequency in the controls subset of Ashkenazi Jewish gnomAD exome samples (1,160 samples)
##gnomAD_exomes_controls_ASJ_AN=(from dbNSFP) Total allele count in the controls subset of Ashkenazi Jewish gnomAD exome samples (1,160 samples)
##gnomAD_exomes_controls_ASJ_nhomalt=(from dbNSFP) Count of individuals with homozygous alternative allele in the controls subset of Ashkenazi Jewish gnomAD exome samples (1,160 samples)
##gnomAD_exomes_controls_EAS_AC=(from dbNSFP) Alternative allele count in the controls subset of East Asian gnomAD exome samples (4,523 samples)
##gnomAD_exomes_controls_EAS_AF=(from dbNSFP) Alternative allele frequency in the controls subset of East Asian gnomAD exome samples (4,523 samples)
##gnomAD_exomes_controls_EAS_AN=(from dbNSFP) Total allele count in the controls subset of East Asian gnomAD exome samples (4,523 samples)
##gnomAD_exomes_controls_EAS_nhomalt=(from dbNSFP) Count of individuals with homozygous alternative allele in the controls subset of East Asian gnomAD exome samples (4,523 samples)
##gnomAD_exomes_controls_FIN_AC=(from dbNSFP) Alternative allele count in the controls subset of Finnish gnomAD exome samples (6,697 samples)
##gnomAD_exomes_controls_FIN_AF=(from dbNSFP) Alternative allele frequency in the controls subset of Finnish gnomAD exome samples (6,697 samples)
##gnomAD_exomes_controls_FIN_AN=(from dbNSFP) Total allele count in the controls subset of Finnish gnomAD exome samples (6,697 samples)
##gnomAD_exomes_controls_FIN_nhomalt=(from dbNSFP) Count of individuals with homozygous alternative allele in the controls subset of Finnish gnomAD exome samples (6,697 samples)
##gnomAD_exomes_controls_NFE_AC=(from dbNSFP) Alternative allele count in the controls subset of Non-Finnish European gnomAD exome samples (21,384 samples)
##gnomAD_exomes_controls_NFE_AF=(from dbNSFP) Alternative allele frequency in the controls subset of Non-Finnish European gnomAD exome samples (21,384 samples)
##gnomAD_exomes_controls_NFE_AN=(from dbNSFP) Total allele count in the controls subset of Non-Finnish European gnomAD exome samples (21,384 samples)
##gnomAD_exomes_controls_NFE_nhomalt=(from dbNSFP) Count of individuals with homozygous alternative allele in the controls subset of Non-Finnish European gnomAD exome samples (21,384 samples)
##gnomAD_exomes_controls_POPMAX_AC=(from dbNSFP) Allele count in the controls subset of population with the maximum AF
##gnomAD_exomes_controls_POPMAX_AF=(from dbNSFP) Maximum allele frequency across populations (excluding samples of Ashkenazi, Finnish, and indeterminate ancestry) in the controls subset
##gnomAD_exomes_controls_POPMAX_AN=(from dbNSFP) Total number of alleles in the controls subset of population with the maximum AF
##gnomAD_exomes_controls_POPMAX_nhomalt=(from dbNSFP) Count of homozygous individuals in the controls subset of population with the maximum allele frequency
##gnomAD_exomes_controls_SAS_AC=(from dbNSFP) Alternative allele count in the controls subset of South Asian gnomAD exome samples (7,845 samples)
##gnomAD_exomes_controls_SAS_AF=(from dbNSFP) Alternative allele frequency in the controls subset of South Asian gnomAD exome samples (7,845 samples)
##gnomAD_exomes_controls_SAS_AN=(from dbNSFP) Total allele count in the controls subset of South Asian gnomAD exome samples (7,845 samples)
##gnomAD_exomes_controls_SAS_nhomalt=(from dbNSFP) Count of individuals with homozygous alternative allele in the controls subset of South Asian gnomAD exome samples (7,845 samples)
##gnomAD_exomes_controls_nhomalt=(from dbNSFP) Count of individuals with homozygous alternative allele in the controls subset of whole gnomAD exome samples (54,704 samples)
##gnomAD_exomes_flag=(from dbNSFP) information from gnomAD exome data indicating whether the variant falling within low-complexity (lcr) or segmental duplication (segdup) or decoy regions. The flag can be either "." for high-quality PASS or not reported/polymorphic in gnomAD exomes, "lcr" for within lcr, "segdup" for within segdup, or "decoy" for with decoy region.
##gnomAD_exomes_nhomalt=(from dbNSFP) Count of individuals with homozygous alternative allele in the whole gnomAD exome samples (125,748 samples)
##gnomAD_genomes_AC=(from dbNSFP) Alternative allele count in the whole gnomAD genome samples (15,708 samples)
##gnomAD_genomes_AF=(from dbNSFP) Alternative allele frequency in the whole gnomAD genome samples (15,708 samples)
##gnomAD_genomes_AFR_AC=(from dbNSFP) Alternative allele count in the African/African American gnomAD genome samples (4,359 samples)
##gnomAD_genomes_AFR_AF=(from dbNSFP) Alternative allele frequency in the African/African American gnomAD genome samples (4,359 samples)
##gnomAD_genomes_AFR_AN=(from dbNSFP) Total allele count in the African/African American gnomAD genome samples (4,359 samples)
##gnomAD_genomes_AFR_nhomalt=(from dbNSFP) Count of individuals with homozygous alternative allele in the African/African American gnomAD genome samples (4,359 samples)
##gnomAD_genomes_AMR_AC=(from dbNSFP) Alternative allele count in the Latino gnomAD genome samples (424 samples)
##gnomAD_genomes_AMR_AF=(from dbNSFP) Alternative allele frequency in the Latino gnomAD genome samples (424 samples)
##gnomAD_genomes_AMR_AN=(from dbNSFP) Total allele count in the Latino gnomAD genome samples (424 samples)
##gnomAD_genomes_AMR_nhomalt=(from dbNSFP) Count of individuals with homozygous alternative allele in the Latino gnomAD genome samples (424 samples)
##gnomAD_genomes_AN=(from dbNSFP) Total allele count in the whole gnomAD genome samples (15,708 samples)
##gnomAD_genomes_ASJ_AC=(from dbNSFP) Alternative allele count in the Ashkenazi Jewish gnomAD genome samples (145 samples)
##gnomAD_genomes_ASJ_AF=(from dbNSFP) Alternative allele frequency in the Ashkenazi Jewish gnomAD genome samples (145 samples)
##gnomAD_genomes_ASJ_AN=(from dbNSFP) Total allele count in the Ashkenazi Jewish gnomAD genome samples (145 samples)
##gnomAD_genomes_ASJ_nhomalt=(from dbNSFP) Count of individuals with homozygous alternative allele in the Ashkenazi Jewish gnomAD genome samples (145 samples)
##gnomAD_genomes_EAS_AC=(from dbNSFP) Alternative allele count in the East Asian gnomAD genome samples (780 samples)
##gnomAD_genomes_EAS_AF=(from dbNSFP) Alternative allele frequency in the East Asian gnomAD genome samples (780 samples)
##gnomAD_genomes_EAS_AN=(from dbNSFP) Total allele count in the East Asian gnomAD genome samples (780 samples)
##gnomAD_genomes_EAS_nhomalt=(from dbNSFP) Count of individuals with homozygous alternative allele in the East Asian gnomAD genome samples (780 samples)
##gnomAD_genomes_FIN_AC=(from dbNSFP) Alternative allele count in the Finnish gnomAD genome samples (1,738 samples)
##gnomAD_genomes_FIN_AF=(from dbNSFP) Alternative allele frequency in the Finnish gnomAD genome samples (1,738 samples)
##gnomAD_genomes_FIN_AN=(from dbNSFP) Total allele count in the Finnish gnomAD genome samples (1,738 samples)
##gnomAD_genomes_FIN_nhomalt=(from dbNSFP) Count of individuals with homozygous alternative allele in the Finnish gnomAD genome samples (1,738 samples)
##gnomAD_genomes_NFE_AC=(from dbNSFP) Alternative allele count in the Non-Finnish European gnomAD genome samples (7,718 samples)
##gnomAD_genomes_NFE_AF=(from dbNSFP) Alternative allele frequency in the Non-Finnish European gnomAD genome samples (7,718 samples)
##gnomAD_genomes_NFE_AN=(from dbNSFP) Total allele count in the Non-Finnish European gnomAD genome samples (7,718 samples)
##gnomAD_genomes_NFE_nhomalt=(from dbNSFP) Count of individuals with homozygous alternative allele in the Non-Finnish European gnomAD genome samples (7,718 samples)
##gnomAD_genomes_POPMAX_AC=(from dbNSFP) Allele count in the population with the maximum AF
##gnomAD_genomes_POPMAX_AF=(from dbNSFP) Maximum allele frequency across populations (excluding samples of Ashkenazi, Finnish, and indeterminate ancestry)
##gnomAD_genomes_POPMAX_AN=(from dbNSFP) Total number of alleles in the population with the maximum AF
##gnomAD_genomes_POPMAX_nhomalt=(from dbNSFP) Count of homozygous individuals in the population with the maximum allele frequency
##gnomAD_genomes_controls_AC=(from dbNSFP) Alternative allele count in the controls subset of whole gnomAD genome samples (5,442 samples)
##gnomAD_genomes_controls_AF=(from dbNSFP) Alternative allele frequency in the controls subset of whole gnomAD genome samples (5,442 samples)
##gnomAD_genomes_controls_AFR_AC=(from dbNSFP) Alternative allele count in the controls subset of African/African American gnomAD genome samples (1,287 samples)
##gnomAD_genomes_controls_AFR_AF=(from dbNSFP) Alternative allele frequency in the controls subset of African/African American gnomAD genome samples (1,287 samples)
##gnomAD_genomes_controls_AFR_AN=(from dbNSFP) Total allele count in the controls subset of African/African American gnomAD genome samples (1,287 samples)
##gnomAD_genomes_controls_AFR_nhomalt=(from dbNSFP) Count of individuals with homozygous alternative allele in the controls subset of African/African American gnomAD genome samples (1,287 samples)
##gnomAD_genomes_controls_AMR_AC=(from dbNSFP) Alternative allele count in the controls subset of Latino gnomAD genome samples (123 samples)
##gnomAD_genomes_controls_AMR_AF=(from dbNSFP) Alternative allele frequency in the controls subset of Latino gnomAD genome samples (123 samples)
##gnomAD_genomes_controls_AMR_AN=(from dbNSFP) Total allele count in the controls subset of Latino gnomAD genome samples (123 samples)
##gnomAD_genomes_controls_AMR_nhomalt=(from dbNSFP) Count of individuals with homozygous alternative allele in the controls subset of Latino gnomAD genome samples (123 samples)
##gnomAD_genomes_controls_AN=(from dbNSFP) Total allele count in the controls subset of whole gnomAD genome samples (5,442 samples)
##gnomAD_genomes_controls_ASJ_AC=(from dbNSFP) Alternative allele count in the controls subset of Ashkenazi Jewish gnomAD genome samples (19 samples)
##gnomAD_genomes_controls_ASJ_AF=(from dbNSFP) Alternative allele frequency in the controls subset of Ashkenazi Jewish gnomAD genome samples (145 samples)
##gnomAD_genomes_controls_ASJ_AN=(from dbNSFP) Total allele count in the controls subset of Ashkenazi Jewish gnomAD genome samples (19 samples)
##gnomAD_genomes_controls_ASJ_nhomalt=(from dbNSFP) Count of individuals with homozygous alternative allele in the controls subset of Ashkenazi Jewish gnomAD genome samples (19 samples)
##gnomAD_genomes_controls_EAS_AC=(from dbNSFP) Alternative allele count in the controls subset of East Asian gnomAD genome samples (458 samples)
##gnomAD_genomes_controls_EAS_AF=(from dbNSFP) Alternative allele frequency in the controls subset of East Asian gnomAD genome samples (458 samples)
##gnomAD_genomes_controls_EAS_AN=(from dbNSFP) Total allele count in the controls subset of East Asian gnomAD genome samples (458 samples)
##gnomAD_genomes_controls_EAS_nhomalt=(from dbNSFP) Count of individuals with homozygous alternative allele in the controls subset of East Asian gnomAD genome samples (458 samples)
##gnomAD_genomes_controls_FIN_AC=(from dbNSFP) Alternative allele count in the controls subset of Finnish gnomAD genome samples (581 samples)
##gnomAD_genomes_controls_FIN_AF=(from dbNSFP) Alternative allele frequency in the controls subset of Finnish gnomAD genome samples (581 samples)
##gnomAD_genomes_controls_FIN_AN=(from dbNSFP) Total allele count in the controls subset of Finnish gnomAD genome samples (581 samples)
##gnomAD_genomes_controls_FIN_nhomalt=(from dbNSFP) Count of individuals with homozygous alternative allele in the controls subset of Finnish gnomAD genome samples (581 samples)
##gnomAD_genomes_controls_NFE_AC=(from dbNSFP) Alternative allele count in the controls subset of Non-Finnish European gnomAD genome samples (2,762 samples)
##gnomAD_genomes_controls_NFE_AF=(from dbNSFP) Alternative allele frequency in the controls subset of Non-Finnish European gnomAD genome samples (2,762 samples)
##gnomAD_genomes_controls_NFE_AN=(from dbNSFP) Total allele count in the controls subset of Non-Finnish European gnomAD genome samples (2,762 samples)
##gnomAD_genomes_controls_NFE_nhomalt=(from dbNSFP) Count of individuals with homozygous alternative allele in the controls subset of Non-Finnish European gnomAD genome samples (2,762 samples)
##gnomAD_genomes_controls_POPMAX_AC=(from dbNSFP) Allele count in the controls subset of population with the maximum AF
##gnomAD_genomes_controls_POPMAX_AF=(from dbNSFP) Maximum allele frequency across populations (excluding samples of Ashkenazi, Finnish, and indeterminate ancestry) in the controls subset
##gnomAD_genomes_controls_POPMAX_AN=(from dbNSFP) Total number of alleles in the controls subset of population with the maximum AF
##gnomAD_genomes_controls_POPMAX_nhomalt=(from dbNSFP) Count of homozygous individuals in the controls subset of population with the maximum allele frequency
##gnomAD_genomes_controls_nhomalt=(from dbNSFP) Count of individuals with homozygous alternative allele in the controls subset of whole gnomAD genome samples (5,442 samples)
##gnomAD_genomes_flag=(from dbNSFP) information from gnomAD genome data indicating whether the variant falling within low-complexity (lcr) or segmental duplication (segdup) or decoy regions. The flag can be either "." for high-quality PASS or not reported/polymorphic in gnomAD exomes, "lcr" for within lcr, "segdup" for within segdup, or "decoy" for with decoy region.
##gnomAD_genomes_nhomalt=(from dbNSFP) Count of individuals with homozygous alternative allele in the whole gnomAD genome samples (15,708 samples)
##hg18_chr=(from dbNSFP) chromosome as to hg18, "." means missing
##hg18_pos(1-based)=(from dbNSFP) physical position on the chromosome as to hg18 (1-based coordinate) For mitochondrial SNV, this position refers to a YRI sequence (GenBank: AF347015)
##hg19_chr=(from dbNSFP) chromosome as to hg19, "." means missing
##hg19_pos(1-based)=(from dbNSFP) physical position on the chromosome as to hg19 (1-based coordinate). For mitochondrial SNV, this position refers to a YRI sequence (GenBank: AF347015)
##integrated_confidence_value=(from dbNSFP) 0 - highly significant scores (approx. p<.003); 1 - significant scores (approx. p<.05); 2 - informative scores (approx. p<.25); 3 - other scores (approx. p>=.25).
##integrated_fitCons_rankscore=(from dbNSFP) integrated fitCons scores were ranked among all integrated fitCons scores in dbNSFP. The rankscore is the ratio of the rank of the score over the total number of integrated fitCons scores in dbNSFP.
##integrated_fitCons_score=(from dbNSFP) fitCons score predicts the fraction of genomic positions belonging to a specific function class (defined by epigenomic "fingerprint") that are under selective pressure. Scores range from 0 to 1, with a larger score indicating a higher proportion of nucleic sites of the functional class the genomic position belong to are under selective pressure, therefore more likely to be functional important. Integrated (i6) scores are integrated across three cell types (GM12878, H1-hESC and HUVEC). More details can be found in doi:10.1038/ng.3196.
##phastCons100way_vertebrate=(from dbNSFP) phastCons conservation score based on the multiple alignments of 100 vertebrate genomes (including human). The larger the score, the more conserved the site. Scores range from 0 to 1.
##phastCons100way_vertebrate_rankscore=(from dbNSFP) phastCons100way_vertebrate scores were ranked among all phastCons100way_vertebrate scores in dbNSFP. The rankscore is the ratio of the rank of the score over the total number of phastCons100way_vertebrate scores in dbNSFP.
##phastCons17way_primate=(from dbNSFP) a conservation score based on 17way alignment primate set, The larger the score, the more conserved the site. Scores range from 0 to 1.
##phastCons17way_primate_rankscore=(from dbNSFP) the rank of the phastCons17way_primate score among all phastCons17way_primate scores in dbNSFP.
##phastCons30way_mammalian=(from dbNSFP) phastCons conservation score based on the multiple alignments of 30 mammalian genomes (including human). The larger the score, the more conserved the site. Scores range from 0 to 1.
##phastCons30way_mammalian_rankscore=(from dbNSFP) phastCons30way_mammalian scores were ranked among all phastCons30way_mammalian scores in dbNSFP. The rankscore is the ratio of the rank of the score over the total number of phastCons30way_mammalian scores in dbNSFP.
##phyloP100way_vertebrate=(from dbNSFP) phyloP (phylogenetic p-values) conservation score based on the multiple alignments of 100 vertebrate genomes (including human). The larger the score, the more conserved the site. Scores range from -20.0 to 10.003 in dbNSFP.
##phyloP100way_vertebrate_rankscore=(from dbNSFP) phyloP100way_vertebrate scores were ranked among all phyloP100way_vertebrate scores in dbNSFP. The rankscore is the ratio of the rank of the score over the total number of phyloP100way_vertebrate scores in dbNSFP.
##phyloP17way_primate=(from dbNSFP) a conservation score based on 17way alignment primate set, the higher the more conservative. Scores range from -13.362 to 0.756 in dbNSFP.
##phyloP17way_primate_rankscore=(from dbNSFP) the rank of the phyloP17way_primate score among all phyloP17way_primate scores in dbNSFP.
##phyloP30way_mammalian=(from dbNSFP) phyloP (phylogenetic p-values) conservation score based on the multiple alignments of 30 mammalian genomes (including human). The larger the score, the more conserved the site. Scores range from -20 to 1.312 in dbNSFP.
##phyloP30way_mammalian_rankscore=(from dbNSFP) phyloP30way_mammalian scores were ranked among all phyloP30way_mammalian scores in dbNSFP. The rankscore is the ratio of the rank of the score over the total number of phyloP30way_mammalian scores in dbNSFP.
##pos(1-coor)=pos(1-coor) from dbNSFP file
##ref=(from dbNSFP) reference nucleotide allele (as on the + strand)
##refcodon=(from dbNSFP) reference codon
##rs_dbSNP151=(from dbNSFP) rs number from dbSNP 151
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT L-10 L-11 L-12 L-13 L-14 L-15 L-16 L-17 L-18 L-19 L-20 L-21
chr1 10347 . AACCCT A 151.66 PASS AC=1;AF=0.042;AN=24;BaseQRankSum=1.96;ClippingRankSum=0;DP=764;ExcessHet=8.0341;FS=0;InbreedingCoeff=-0.2646;MLEAC=2;MLEAF=0.083;MQ=34.02;MQRankSum=0;QD=15.17;ReadPosRankSum=-0.319;SOR=1.609;VQSLOD=1.09;culprit=DP;CSQ=-|upstream_gene_variant|MODIFIER|DDX11L1|ENSG00000223972|Transcript|ENST00000450305|transcribed_unprocessed_pseudogene||||||||||rs1363828207|1658|1||deletion|HGNC|HGNC:37102||||||||||Ensembl|ACCCT|ACCCT|||OR4F5||||||||||||||||||||||||||||||||5.752|0.340185|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||,-|upstream_gene_variant|MODIFIER|DDX11L1|ENSG00000223972|Transcript|ENST00000456328|processed_transcript||||||||||rs1363828207|1517|1||deletion|HGNC|HGNC:37102|YES|1||||||||Ensembl|ACCCT|ACCCT|||OR4F5||||||||||||||||||||||||||||||||5.752|0.340185|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||1||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||,-|downstream_gene_variant|MODIFIER|WASH7P|ENSG00000227232|Transcript|ENST00000488147|unprocessed_pseudogene||||||||||rs1363828207|4052|-1||deletion|HGNC|HGNC:38034|YES|||||||||Ensembl|ACCCT|ACCCT|||OR4F5||||||||||||||||||||||||||||||||5.752|0.340185|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||,-|downstream_gene_variant|MODIFIER|WASH7P|653635|Transcript|NR_024540.1|transcribed_pseudogene||||||||||rs1363828207|4010|-1||deletion|EntrezGene|HGNC:38034||||||||||RefSeq|ACCCT|ACCCT|OK||OR4F5||||||||||||||||||||||||||||||||5.752|0.340185|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||,-|upstream_gene_variant|MODIFIER|DDX11L1|100287102|Transcript|NR_046018.2|transcribed_pseudogene||||||||||rs1363828207|1522|1||deletion|EntrezGene|HGNC:37102||||||||||RefSeq|ACCCT|ACCCT|||OR4F5||||||||||||||||||||||||||||||||5.752|0.340185||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| GT:AD:DP:GQ:PL 0/0:54,8:62:0:0,0,1573 0/0:70,8:78:0:0,0,1948 0/0:50,6:56:0:0,0,1393 0/0:49,9:58:0:0,0,1390 0/0:60,9:69:0:0,0,1691 0/0:53,7:60:0:0,0,1578 0/0:55,9:64:0:0,0,1548 0/1:3,7:10:51:188,0,51 0/0:59,5:64:31:0,31,1827 0/0:70,7:77:17:0,17,2229 0/0:78,7:85:28:0,28,2329 0/0:69,10:79:0:0,0,1898
chr1 51972 rs546829777 GGAC G 874.34 PASS AC=4;AF=0.182;AN=22;DB;DP=187;ExcessHet=0.0164;FS=0;InbreedingCoeff=0.9066;MLEAC=4;MLEAF=0.182;MQ=34.36;QD=34.24;SOR=2.2;VQSLOD=2.71;culprit=DP;CSQ=-|upstream_gene_variant|MODIFIER|OR4G4P|ENSG00000268020|Transcript|ENST00000606857|unprocessed_pseudogene||||||||||rs546829777|498|1||deletion|HGNC|HGNC:14822|YES|||||||||Ensembl|GAC|GAC|||OR4F5|||||0.0006|0|0|0|0.002|0.001||||||||||||0.002|EUR|||||||||1.836|0.009317|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||,-|regulatory_region_variant|MODIFIER|||RegulatoryFeature|ENSR00000344272|promoter_flanking_region||||||||||rs546829777||||deletion|||||||||||||||||OR4F5|||||0.0006|0|0|0|0.002|0.001||||||||||||0.002|EUR|||||||||1.836|0.009317||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| GT:AD:DP:GQ:PL ./.:0,0:0:.:0,0,0 0/0:1,0:1:3:0,3,13 1/1:0,13:13:39:585,39,0 0/0:11,0:11:33:0,33,395 1/1:0,8:8:24:360,24,0 0/0:13,0:13:39:0,39,448 0/0:18,0:18:54:0,54,616 0/0:39,0:39:99:0,117,1409 0/0:29,0:29:84:0,84,1260 0/0:12,0:12:36:0,36,433 0/0:24,0:24:72:0,72,852 0/0:19,0:19:57:0,57,668
Dear @matmu,
I think I found the issue: there is a #chr
field in your CSQ:
...|CADD_PHRED|CADD_RAW|#chr|1000Gp3_AC|1000Gp3_AF|1000Gp3_AFR_AC|...
If works when you remove the #
character from #chr
in the CSQ field.
This comes from dbNSFP:
###chr=#chr from dbNSFP file
We need to investigate it and try to remove the strange characters such as #
in the selected fields.
Best regards, Laurent
Super, thanks for your help. Yes, an error message would be also helpful. Do you, how filter_vep behaves, if there are duplicate fields? Will it then filter using the first occurence?
Dear @matmu,
About the dbNSFP issue, we are planning to add a fix to prevent it in the next VEP version (96), which should be released next week.
In the unfortunate case where a field is duplicated, the last occurence of the field will be used for filtering: the code is looping over the ordered list of fields and adds the corresponding data into a hash, e.g.: $data{$field_name} = $field_value
.
However this is something we need to look at, especially for VCF files with already existing INFO fields and/or using several VEP plugins.
Best regards, Laurent
Ok, thanks.
I have successfully annotated my vcf file using a local installation of variant effect predictor (VEP). However, the filtering doesn't work. E.g. the command
filter_vep -i vep.test.vcf.gz -filter "Existing_variation" -o out.txt --force_overwrite
to keep all variants with a defined field "Existing_variation", doesn't return any variant although there are multiple such variants present in the dataset.
When listing the available fields with
filter_vep -i vep.test.vcf.gz --list --vcf_info_field CSQ
it doesn't return the fields stated in the CSQ info field (Starting with "
##INFO=<ID=CSQ,Number=.,Type=String,Description="Consequence annotations from Ensembl VEP. Format: Allele|Consequence|IMPACT
"), instead the other INFO fields in the VCF are listed: