Closed droazen closed 7 years ago
SnpEff has a document with a reasonable spec for annotations:
After getting comments, it looks like the format we're going to use is adding the annotations in the info field, grouping by allele, then transcript. The specifics of annotation name and delimiters can be debated later (and easily changed).
For example (newlines added for readability):
22 21807795 . G C,A . . DP=1000;ECNT=40;IN_PON;NLOD=59.66,68.06;N_ART_LOD=8.88,1.93;TLOD=16.45,36.72;TLOD_FWD=-1.392e+00;TLOD_REV=17.84;TUMOR_SB_POWER_FWD=0.558;TUMOR_SB_POWER_REV=0.724; VC= C|missense_variant|MODERATE|MAPK1|ENSG00000100030|Transcript|ENST00000215832|protein_coding|2/9||||360|171|57|S/R|agC/agG|||-1||HGNC|HGNC:6871||Ensembl C|missense_variant|MODERATE|MAPK1|ENSG00000100030|Transcript|ENST00000398822|protein_coding|2/8||||411|171|57|S/R|agC/agG|||-1||HGNC|HGNC:6871||Ensembl C|missense_variant|MODERATE|MAPK1|ENSG00000100030|Transcript|ENST00000544786|protein_coding|2/7||||171|171|57|S/R|agC/agG|||-1||HGNC|HGNC:6871||Ensembl C|missense_variant|MODERATE|MAPK1|5594|Transcript|NM_002745.4|protein_coding|2/9||||411|171|57|S/R|agC/agG|||-1||EntrezGene|HGNC:6871|rseq_mrna_match|RefSeq C|missense_variant|MODERATE|MAPK1|5594|Transcript|NM_138957.3|protein_coding|2/8||||411|171|57|S/R|agC/agG|||-1||EntrezGene|HGNC:6871|rseq_mrna_match|RefSeq A|synonymous_variant|LOW|MAPK1|ENSG00000100030|Transcript|ENST00000215832|protein_coding|2/9||||360|171|57|S|agC/agT|||-1||HGNC|HGNC:6871||Ensembl A|synonymous_variant|LOW|MAPK1|ENSG00000100030|Transcript|ENST00000398822|protein_coding|2/8||||411|171|57|S|agC/agT|||-1||HGNC|HGNC:6871||Ensembl A|synonymous_variant|LOW|MAPK1|ENSG00000100030|Transcript|ENST00000544786|protein_coding|2/7||||171|171|57|S|agC/agT|||-1||HGNC|HGNC:6871||Ensembl A|synonymous_variant|LOW|MAPK1|5594|Transcript|NM_002745.4|protein_coding|2/9||||411|171|57|S|agC/agT|||-1||EntrezGene|HGNC:6871|rseq_mrna_match|RefSeq A|synonymous_variant|LOW|MAPK1|5594|Transcript|NM_138957.3|protein_coding|2/8||||411|171|57|S|agC/agT|||-1||EntrezGene|HGNC:6871|rseq_mrna_match|RefSeq
Can look at what other similar tools have done:
SnpEff in particular already has a scheme for annotating the VCF INFO field with info from all transcripts.