ga4gh / va-spec

An information model for representing variant annotations.
Apache License 2.0
17 stars 4 forks source link

Molecular Consequence Annotation Definition and Scope #4

Open mbrush opened 6 years ago

mbrush commented 6 years ago

Most sources of this type of annotation use terms from the Sequence Ontology as descriptors - e.g. splicing variant, missense variant, nonsense variant, frameshift variant, stop gained. The terms we have seen used in examples of molecular consequence annotations generally describe consequence at the sequence level - i.e. predicted impact on genomic, transcript, or aa sequence, or on the processing events that occur across these types of biological sequence molecules (e.g. transcription, translation, slicing). Statements at the functional level - about predicted impact on function of genes and their product - fall into the functional impact categories of variant annotations.

But even at the sequence level there are variable levels of things that get stated, and it may be that there are finer subcategories of molecular consequence that should be split out, or some annotations that fall into other categories.

It may be worth reviewing the ClinGen 'Molecular Consequence' value set of SO terms (see below) - and considering if the statements made in annotations using each of these terms are in scope, and, fit into meaningful subcategories, and/or may be split/partitioned into separate categories. For example, there are SO terms that describe the fundamental type of variation (e.g. 'substitution', 'inversion', indel', 'copy number variation', 'translocation'), others that define a type of feature affected (e.g. 5'UTR, interior intron'), and terms that describe an actual consequence (e.g. 'stop lost', 'missense', 'frameshift', 'synonymous'). Are all these in scope? Any value in bucketing?


SO:0000159 | deletion SO:0000191 | interior intron SO:0000199 | translocation SO:0000203 | three prime UTR SO:0000204 | five prime UTR SO:0000507 | pseudogenic exon SO:0000667 | insertion SO:0001019 | copy number variation SO:0001568 | splicing variant SO:0001578 | stop lost SO:0001583 | missense variant SO:0001586 | non conservative missense variant SO:0001587 | stop gained SO:0001589 | frameshift variant SO:0001629 | splice-site variant SO:0001819 | synonymous variant SO:0001823 | conservative Inframe Insertion SO:0001824 | disruptive inframe insertion SO:0001825 | conservative Inframe Deletion SO:0001826 | disruptive inframe deletion SO:0001909 | frameshift elongation SO:0002007 | MNV SO:0002012 | start lost SO:0002073 | no sequence alteration SO:1000002 | substitution SO:10000032 | indel SO:1000036 | inversion


See also/compare with Ensembl variant consequence here (and compare with their variant types here).

The list of value sets and their usage in the ClinGen Allele model here are also relevant and informative.

mbrush commented 6 years ago

Also, we should consider how to handle Molecular Consequence annotations that have computational vs experimental evidence. For example, consider the ClinGen Example 8, which annotates to "SO:0001629" ("splice-site variant"), and has the following description: “This variant occurs in the invariant region (+/- 1,2) of the splice consensus sequence and in vitro studies confirmed that it leads to aberrant splicing and reduced DSC2 protein levels (Heuser 2006 PMID: 17186466)."

The former part of this description expresses a Molecular Consequence statement, but the latter part would seem to express an Experimental Functional Impact statement, right?

rrfreimuth commented 6 years ago

Thanks for raising this issue for discussion. I agree we should put some effort into trying to disambiguate these terms and how they are modeled.

IMO, these terms are not simply a property of a variant... they are a property of an allele AND a sequence AND a set of annotations. For example, a variant allele might be an insertion on hg37 but a substitution on hg38. Similarly, a variant may be in an intron in one transcript of a gene but in an exon or in a splice site in another.

larrybabb commented 6 years ago

for what its worth ClinVar currently has the following list of Molecular Consequence terms in its database (some of these may be created by submitters). I do believe they control this however. We can investigate further if we find it important or useful to do so...

ClinVar Molecular Consequence field values as of 7/25/18 2kb upstream variant 3 prime utr variant 5 prime utr variant 500b downstream variant frameshift variant inframe variant intergenic variant intron variant missense variant non coding transcript variant nonsense so 0001574 so 0001575 so 0001578 so 0001583 so 0001587 so 0001589 so 0001619 so 0001623 so 0001624 so 0001627 so 0001628 so 0001634 so 0001636 so 0001650 so 0001819 splice acceptor variant splice donor variant stop lost synonymous variant

mbrush commented 6 years ago

Reviewing terms from ClinGen, Ensembl, and ClinVar value sets for Molecular Consequence, we can bucket them into four categories (link):

  1. variation class: describes the fundamental type of variant represented (e.g. 'substitution', 'inversion', indel', 'copy number variation', 'translocation')
  2. affected feature type: describes the type of feature affected/hit by the variant ( e.g. 5'UTR, interior intron', '5kb upstream variant')
  3. feature consequence: describes an impact a variant has on the extent or structure or number of particular type of feature (e.g. 'tfbs ablation', 'feature elongation')
  4. processing consequence: describes the impact a variant has on how a gene or transcript is processed into a final product (e.g. 'stop lost', 'missense', 'frameshift', 'synonymous')

See how these categories are applied in a classification of molecular consequence value sets used by ClinGen, Ensembl, and ClinVar in the spreadsheet here.


Based on outcomes of reviewing this information on the 10-31-18 VA call, we propose updating the definition and description of our Molecular Consequence annotation type to be more clear and prescriptive about what is in scope here:

Proposed Comments (additional info to help clarify scope and distinguish from other VA types):

mbrush commented 6 years ago

Consider where the concepts in the 'mutation consequence' slide (slide 7) in the deck here might fit (seem to span molecular consequence and functional impact VA types): https://www.clinicalgenome.org/site/assets/files/2757/fitzpatrick_ddg2p.pdf

mbrush commented 5 years ago

Remaining issues to sort out to wrap initial pass at the MC statement model:

Resolve Now:

For Later:

ahwagner commented 5 years ago

I have a few thoughts in response to the terms of the DDG2P Slide Deck, slide 7.

  1. The use of dominant negative as a mutation consequence category term (from DDG2P Slide Deck, slide 7) is a little overloaded. Antimorphic is a precise term that captures the intended concept.

  2. There should perhaps be a notion describing a "no effect" alongside those categories. I imagine this would subsume the all missense/in frame category, but also include cases where the variant is synonymous, intronic, extragenic, etc. but has no observed functional impact.

  3. Outside of the cis-regulatory or promotor mutation category (which seems to me to be an Affected Feature / Relative Location annotation), all of these describe the functional impact on the protein product, and I would argue don't really belong in the Molecular Consequence scope.

pnrobinson commented 5 years ago

Antimorphic: A type of mutation in which the altered gene product possesses an altered molecular function that acts antagonistically to the wild-type allele.

I am not sure if this is the same as dominant negative, which is usually used in the context of gene products that aggregate into larger structures such as collagen bundles. The resulting aggregate is destabilized/weakened. This is not really a different function, it can be the case that the two types of puzzle piece just do not fit together any more.

In any case, dominant negative is widely used in the clinical community, and I have never heard antimorphic before just now.

ahwagner commented 5 years ago

To clarify this point, antimorphic is one of Muller's morphs, a long-standing set of terms to precisely define the functional consequences of molecular alterations. The definition I am familiar with varies slightly from the one defined by the Jackson Labs (restated above), in that it:

ahwagner commented 5 years ago

☝️ these considerations (last three comments) are relevant to the functional impact scope (#34 / #21), not molecular consequence.