clingen-data-model / allele

Documentation for data model of ClinGen
10 stars 2 forks source link

Describing molecular consequence #133

Closed srynobio closed 3 years ago

srynobio commented 9 years ago

After our call today I sat down with other member of our annotation team and got some clarification, both for myself and for the model.

First off the terms we're using for primary-amino-acid-change-type would be considered molecular consequences (as they are describing changes at the molecular level). A term like LOF would be described as a functional consequence.

So based on @larrybabb example:

primary_amino_acid_change_type (molecular consequence) functional consequence
stop_gained loss of function

I did review SO and found that we do have some standard terminology for functional change, so if we wanted to add an additional type list we would have something to start with.

Also, with regards to how ClinVar uses Molecular consequence, many of the term they use we would agree with but some have issues like. For example this record uses intron_variant as a molecular consequence were we would consider it as a description of a location.

An example:

LOF -> (functional consequence)
stop_gained -> primary_amino_acid_change_type (molecular consequence)
CDS -> primary_transcript_region_type
SNV -> primary_nucleotide_change_type
larrybabb commented 9 years ago

Very helpful @srynobio.

If you could put together a short list of SO terms for Functional Consequence under a new issue, that would be very helpful. I think we should seriously consider adding it. If nothing else, it will help clarify the distinction between the terms. We may also want to add a short discussion page to our documentation on this topic so that it may be referenced by our model term documentation.

larrybabb commented 9 years ago

From Bob Freimuth on 7/1/15....

Very brief use case:  Consider a missense variant that (by definition) results in a change in the 
amino acid sequence.  The function of the expressed protein might be affected by that change 
in AA sequence.

The (predicted) change at the AA level is one type of consequence of the genetic variant.  It is 
possible to capture this “molecular consequence” without any assertion of the functional effect 
of that change.

The “functional consequence” of the new AA sequence is a different concept – it could result in 
a higher or lower level of activity, decoupled posttranslational regulation 
(e.g., activation/repression pathways), higher/lower levels of protein, etc.

Going a little deeper:

The molecular consequence is often predicted based on our knowledge of molecular biology 
(e.g., codon usage, splicing patterns), and can often be assigned computationally.  It can be 
dependent on transcript (splicing, translation initiation site, etc), cell type, etc.

The functional consequence reflects the biological context of the gene product (e.g., signaling 
pathway).  It may be predicted but is often backed up by experimental evidence (e.g., in vitro 
studies, genotype-phenotype association studies).  We are occasionally surprised when the 
actual functional consequence doesn’t match our prediction, due to biological processes that 
we are still trying to understand (e.g., rules about exon skipping).

I’m not sure that “molecular consequence” and “functional consequence” are the best terms to 
use, but they provide a starting point.  I think it will be important to differentiate between these 
concepts if only to prevent confusion when discussing the “consequence” of a genetic variant.
larrybabb commented 9 years ago

From Tam Sneddon on 7/1/15

Regarding FunctionalConsequence, see v.43 on this page: https://ncbiconfluence.ncbi.nlm.nih.gov/display/CLIN/ClinVar+Submission+Form+Elements

Also, notes from Donna on this page: https://ncbiconfluence.ncbi.nlm.nih.gov/display/CLIN/Proposed+Variant+Curation+Data+Fields

Tam: Notes from email conversation with Donna regarding splice variants in ClinVar (Feb 19th): If a submitter provides Molecular Consequence data, we override with what we compute Effect on splicing is a functional consequence; we do not compute, and accept a submitter’s assessment of the biological effect of the variant. We report missense based only on predicted effect on translation. We do not independently report proximity to splice junction We do not predict effect on splicing. If submitters tell us there is an effect on splicing for any molecular consequence we predict, we will report that., but as a functional consequence, so there is always a possibility that a missense variant will have exon skipping as a functional consequence.

Example from ClinVar XML:

<ClinVarSet ID="6073963">
  <RecordStatus>current</RecordStatus>
  <Title>NM_002609.3(PDGFRB):c.2083C&gt;T (p.Arg695Cys) AND Basal ganglia calcification, idiopathic, 4</Title>
  <ReferenceClinVarAssertion DateCreated="2014-07-11" DateLastUpdated="2015-04-29" ID="299572">
    <ClinVarAccession Acc="RCV000128554" Version="2" Type="RCV" DateUpdated="2015-04-29"/>
    <RecordStatus>current</RecordStatus>
    <ClinicalSignificance DateLastEvaluated="2012-01-01">
      <ReviewStatus>classified by single submitter</ReviewStatus>
      <Description>Likely pathogenic</Description>
    </ClinicalSignificance>
    <Assertion Type="variation to disease"/>
    <ObservedIn>
       ...
    </ObservedIn>
    <MeasureSet Type="Variant" ID="135650">
      <Measure Type="single nucleotide variant" ID="139368">
        <Name>
          <ElementValue Type="Preferred">NM_002609.3(PDGFRB):c.2083C&gt;T (p.Arg695Cys)</ElementValue>
        </Name>
        <AttributeSet>
          <Attribute Type="FunctionalConsequence" integerValue="548">protein loss of function</Attribute>
          <XRef ID="0043" DB="Variation Ontology"/>
        </AttributeSet>

Desktop curator$ grep "FunctionalConsequence" ClinVarFullRelease_00-latest_June.xml | sort | uniq >
          <Attribute Type="FunctionalConsequence" integerValue="547">cryptic splice donor activation</Attribute>
          <Attribute Type="FunctionalConsequence" integerValue="548">protein loss of function</Attribute>
          <Attribute Type="FunctionalConsequence" integerValue="550">cryptic splice acceptor activation</Attribute>
          <Attribute Type="FunctionalConsequence" integerValue="551">effect on promoter activity</Attribute>
          <Attribute Type="FunctionalConsequence" integerValue="552">effect on RNA abundance</Attribute>
          <Attribute Type="FunctionalConsequence" integerValue="553">unknown functional consequence</Attribute>
          <Attribute Type="FunctionalConsequence" integerValue="554">probably no functional consequence</Attribute>
          <Attribute Type="FunctionalConsequence" integerValue="555">has functional consequence</Attribute>
          <Attribute Type="FunctionalConsequence" integerValue="556">probably has functional consequence</Attribute>
          <Attribute Type="FunctionalConsequence" integerValue="557">no known functional consequence</Attribute>
          <Attribute Type="FunctionalConsequence" integerValue="559">exon loss</Attribute>
          <Attribute Type="FunctionalConsequence" integerValue="565">effect on RNA splicing</Attribute>
          <Attribute Type="FunctionalConsequence" integerValue="908">variation affecting RNA</Attribute>
          <Attribute Type="FunctionalConsequence" integerValue="910">effect on protein subcellular localization</Attribute>
          <Attribute Type="FunctionalConsequence" integerValue="911">effect on protein abundance</Attribute>
          <Attribute Type="FunctionalConsequence" integerValue="912">effect on protein activity</Attribute>
          <Attribute Type="FunctionalConsequence" integerValue="918">polypeptide_partial_loss_of_function</Attribute>
          <Attribute Type="FunctionalConsequence" integerValue="920">initiation codon change</Attribute>
          <Attribute Type="FunctionalConsequence" integerValue="921">decreased_translational_product_level</Attribute>
          <Attribute Type="FunctionalConsequence" integerValue="927">effect on regulatory function of DNA</Attribute>
          <Attribute Type="FunctionalConsequence" integerValue="928">effect on catalytic protein function</Attribute>
          <Attribute Type="FunctionalConsequence">cryptic splice donor activation</Attribute>
          <Attribute Type="FunctionalConsequence">decreased_translational_product_level</Attribute>
          <Attribute Type="FunctionalConsequence">effect on RNA abundance</Attribute>
          <Attribute Type="FunctionalConsequence">effect on RNA splicing</Attribute>
          <Attribute Type="FunctionalConsequence">effect on catalytic protein function</Attribute>
          <Attribute Type="FunctionalConsequence">effect on protein abundance</Attribute>
          <Attribute Type="FunctionalConsequence">effect on protein activity</Attribute>
          <Attribute Type="FunctionalConsequence">effect on protein subcellular localization</Attribute>
          <Attribute Type="FunctionalConsequence">exon loss</Attribute>
          <Attribute Type="FunctionalConsequence">has functional consequence</Attribute>
          <Attribute Type="FunctionalConsequence">has probably functional consequence</Attribute>
          <Attribute Type="FunctionalConsequence">initiation codon change</Attribute>
          <Attribute Type="FunctionalConsequence">loss-of-function</Attribute>
          <Attribute Type="FunctionalConsequence">no known functional consequence</Attribute>
          <Attribute Type="FunctionalConsequence">polypeptide_partial_loss_of_function</Attribute>
          <Attribute Type="FunctionalConsequence">probably has functional consequence</Attribute>
          <Attribute Type="FunctionalConsequence">probably no functional consequence</Attribute>
          <Attribute Type="FunctionalConsequence">protein loss of function</Attribute>
          <Attribute Type="FunctionalConsequence">unknown functional consequence</Attribute>
          <Attribute Type="FunctionalConsequence">variation affecting RNA</Attribute>

It doesn't look like ClinVar are using SO but I do not know what the 'integerValue' is.

srynobio commented 9 years ago

I believe my thinking in in line with Bob's and his example and descriptions are concise. I agree that we should stick with the terms “molecular consequence” and “functional consequence” because at least for the purpose of the Data Model and many people working within the world of biological annotation we'll be on the same page. I do think we'll have some issue with regard to how ClinVar uses the term (as per my example).

I know it may be hotly debated but for clarity maybe we should discuss (as a group, etc) using the term molecular consequence with regard to changes at the AA level although technically you could use the term when talking about nucleotides changes as well.

larrybabb commented 9 years ago

Let's definitely bring this up at the next meeting (with @cbizon in attendance, as he has had a lot of input in defining the existing attribute names).

larrybabb commented 9 years ago

Here is some further info on a dialog between Robert Freimuth [RF] and myself [LB] ...

[LB] The molecular consequence attribute seems to logically belo[ng] to the DNA change that 
causes that molecular consequence and always to a DNA variant for a specific transcript.  

[RF] I agree.  Using these terms, the molecular consequence would be directly related to a 
DNA change *in the context of* a specific transcript.

[LB] However, if we always have the predicted/derived AA change that is caused by a given 
DNA change (and I assume there is only one?), then I would think the AA Change type would 
suffice to hold the value of what we are calling the Molecular Consequence.  If you are 
following this, could you verify if I am making any sense here?  

[RF]  There would be only one AA change for a given DNA change *in the context of* a 
specific transcript.  I’d need to see a definition and probably some examples of AA Change 
type before responding to your last point.

[LB] As far as the functional consequence.  It sounds to me like that attribute should be 
captured on an AA change not on a DNA change.  Since it is the AA change that is 
essentially linked to the predicted or observed functional consequence.

[RF] I agree – I’d tend to put the functional info at the AA/protein level, if for no other reason 
than multiple DNA changes can result in the same (predicted) protein and we’d want to 
ensure the downstream assertions are consistent.  That said, there are functional effects 
that do not impact the protein.  For example, some DNA changes can result in mRNA that 
is recognized as “defective” and subject to a much higher rate of degradation.  As a result, 
there is quantifiably less of that mRNA in the cell to be translated and consequently 
(there’s that word again) less of the corresponding protein.  I’m not sure if this sort of use 
case is in scope for ClinGen or not – I presume it is.  If so, then there would also be a 
functional effect that could be assigned at the transcript level as well as (not instead of) 
at the protein level.
srynobio commented 9 years ago

I think we should steer away from put[ing] the functional information at the AA/protein level. Because biological speaking we can never be 100% sure what the overall functional effect is as it can change based on: population, the gene-disease associations, etc. Also, when were talking about molecular were talking about the parts whereas functional is talking about what happens next. We can make probabilistic prediction that functionally something will happen base on an observed change, but its overall effect (e.g. penetrance, phenotype) often requires (or is derived) from experimental studies.

This is a quick example: Here I'm using SO to annotate the functional effect, and I'm unsure how you would group it into one place (PAACT). Also this would be a predicted effect which would be independent of a disease under review.

PNCT PTCT PAACT functional effect
deletion CDS frameshift_truncation loss_of_function_variant

Also keep in mind that more ambiguous simpleAlleles (e.g in the last exon) would be even more difficult.

larrybabb commented 9 years ago

Here is the feedback that Tam Sneddon rec'd from Donna Maglott (NCBI) regarding the functional consequence vocabulary and what the "integerValue" key was linked to (SO or Vario).

FYI - "integerValue" aren't supposed to be public!

From: "Donna Maglott (NIH/NLM/NCBI) [E]" <maglott@ncbi.nlm.nih.gov>
To: "Tam P Sneddon" <tsneddon@stanford.edu>, "Melissa Landrum (NIH/NLM/NCBI) [E]" <landrum@ncbi.nlm.nih.gov>
Sent: Wednesday, July 1, 2015 5:00:54 PM
Subject: RE: Functional consequence

And to think I was thinking of answering.
Thanks for pointing that out. We should not be reporting that int value publicly; that is our internal 
identifier to link to the term from VariO

From: Tam P Sneddon [mailto:tsneddon@stanford.edu] 
Sent: Wednesday, July 01, 2015 7:54 PM
To: Maglott, Donna (NIH/NLM/NCBI) [E]; Landrum, Melissa (NIH/NLM/NCBI) [E]
Subject: Re: Functional consequence

Just looked at ClinVar website and see it comes from the Variation Ontology 
(http://www.variationontology.org/):

http://www.ncbi.nlm.nih.gov/clinvar/variation/135650/

Thanks!
tsneddon commented 9 years ago

VO does specify "Loss of Function":

http://www.variationontology.org/

VariO:0043 : protein loss of function (presumably ClinVar integerValue="548")

However, in ClinVar LOF appears to be represented in several ways:

protein loss of function loss-of-function protein loss of function
srynobio commented 9 years ago

I uploaded the VO into OBOB so we can view it www.obobrowser.org/browser/obob.cgi?release=public_vario.obo

larrybabb commented 9 years ago

Not sure if this is within the Allele model. It is more of a statement about the allele (maybe in the assertion/evidence model?).