Open mbrush opened 5 years ago
Issues/Questions:
Regarding the question of if/how to capture mechanism of pathogenicity (e.g. oncogene activation vs TSG inactivation) as part of this VA type, first we need to consider if this is even in scope for the primary statement here. It may be that this mechanistic aspect represents completely a different statement that we should create a separate VA type for.
If it is in scope here, we could do this using a qualifier with values like 'driver', 'modifier . . . or 'oncogene activation' and 'TSG inactiviation'. Alternatively, we could model this into the predicate, by defining a more granular set of relationships extending the basic ACMG-like ones. (e.g. is_oncogenic_driver_of).
I put some thoughts here in case I can not discuss with you online. As a disclaimer, remember that I have no expertise in developing data models, what I have is good experience in constructing genomic interpretation tools and also in interacting with users with different profiles/needs in both research and clinical setting. Putting my comments in that context, please see the following (and please apologies for any content that may be irrelevant at this point of your discussions)
I would keep the ‘high level’ interpretation terms simple, so the ‘main’ classification can be understood at a first sight by everyone. Therefore, I will define the variant effect main term in the line of oncogenic/likely oncogenic/vus/likely neutral/neutral (see next point). The more elaborated terms (LoF, switch of function, gain of function, truncating variant, disrupting event, etc), can be sometimes tricky to understand –specially in the context of certain genes-- and I would leave this as a more detailed info in an additional field (but I would indeed have such additional field; see one of the points below)
Regarding these main effect terms, I would like to keep the 5 terms (e.g. oncogenic/likely oncogenic/vus/likely neutral/neutral ) for two reasons: (a) it makes sense to have two tiers of how sure you are of the reported effect (‘it is kind of certain’ and ‘it is likely certain’ ); and (b) it is nice to make it consistent with the pathogenic/likely pathogenic etc model
Regarding the specific labels for that, I vote for not using ‘pathogenic’ and ‘benign’ for the somatic variants, so it can be distinguished from the terms used for germline predisposing/causing effects. What label to use, likely 10 people would have 10 favourites. I use to use ‘oncogenic’, ‘likely oncogenic’, ‘vus’, ‘likely neutral’ and ‘neutral’ --as e.g. in OncoKB--. But I acknowledge that this can be confusing if identified that is an effect in only oncogenes. Other options can be driver and passenger, but can be too technical. Tumorigenic and non-tumorigenic ?
I think that the cancer type should be part of the info, at least as an optional field. Note here that many people, when talking about somatic oncogenic events, do not believe in the need of including the cancer type since somehow they believe that the ‘oncogenic’ definition is universal. I would advise against that, since (a) some (although it is a minority) of the oncogenic variants are likely to be context-dependant (a variant oncogenic in a tumor tissue can be neutral in another, and viceversa); (b) the cancer type in which the reported effect has been tested is --in any case-- a useful info (and it is up to the user whether this can be extrapolated to other cancer types)
Regarding the last point, note that for some studies, to define the cancer type in which the particular effect has been evaluated is tricky (e.g. loose cancer type experimental models due to different reasons that I will not enumerate here). Therefore, you need to allow a ‘not speciifc cancer type’ or similar term meaning that this info can not be specified.
another fundamental question is the level of strength for stating a given effect. In our case, the effect can be reported e.g. to be oncogenic or likely oncogenic, but a orthogonal question is the strength of the evidence to sustain that. For instance, a cancer cohort study can conclude that a variant is oncogenic, but maybe that study has some caveats (e.g. the sample size); however, a experimental study can conclude that a variant is likely oncogenic (so it is not even certain that is oncogenic, due to a reason ‘x’), but the quality of the experimental data to say so is adamant. Note that some knowledgebases only ‘accept’ data with a certain level of quality in the studies that report the variant effect, but others include both the level of relevance (that for oncogenic variants means the 5-level classification, for biomarkers of drug response can be ranged from a clinical guideline to a pre-clinical observation, etc) as well as the level of strength (how good is the clinical or pre-clinical study that report that level of relevance in the drug biomarker example). Since we are developing a data model and not a database here, I would say that we need to include a field with the level of strength supporting the oncogenic/neutral effect.
As stated before, I d like to see an optional field with the mechanism of action of the variant (when it is found to be oncogenic); loss-of-function, gain-of-function, etc
I would like to see also a reference of the study(ies) in which the effect of the variant has been reported (e.g. pubmed id and –for the emerging ones-- a conference abstract).
I do not know to which extent a ‘other comments’ field is technically acceptable to be included, but I always think there is room for such a thing. For this variant model, this could include details of the level of evidence of the effect (e.g. if it is based in experimental data, to write some details about that experiment).Some comments in case i can not discuss with you online. As a disclaimer, remember that I have no expertise in developing data models, what I have is a solid experience in constructing genomic interpretation tools and also in interacting with users with different profiles/needs in both research and clinical setting. Putting my thoughts in that context, please see the following (and please apologies for any content that may be irrelevant at this point of your discussions)
- Regarding the last point, note that for some studies, to define the cancer type in which the particular effect has been evaluated is tricky (e.g. loose cancer type experimental models due to different reasons that I will not enumerate here). Therefore, you need to allow a ‘not speciifc cancer type’ or similar term meaning that this info can not be specified.
This discussion is equivalent to the one about leaving the "condition" field blank in Variant Pathogenicity type, am I right? https://github.com/ga4gh-gks/variant-annotation-model/issues/25
Regarding these other points:
- another fundamental question is the level of strength for stating a given effect. [...] I would say that we need to include a field with the level of strength supporting the oncogenic/neutral effect.
- I would like to see also a reference of the study(ies) in which the effect of the variant has been reported (e.g. pubmed id and –for the emerging ones-- a conference abstract)
- a ‘other comments’ field is technically acceptable to be included, but I always think there is room for such a thing. For this variant model, this could include details of the level of evidence of the effect (e.g. if it is based in experimental data, to write some details about that experiment).
Sounds to me like they are all related with evidence and provenance. Definitely interesting to take into account. We'll handle them when we get to modelling evidence/provenance.
Subject:
Descriptor:
Predicate:
Qualifiers:
variantOriginQualifier
: same considerations as for pathogenicity interpretations, but value here is 'somatic'pathogenicMechanismQualifier
:
Evidence:
Given discussions and feedback on recent calls, we are exploring the idea of collapsing Variant Pathogenicity Interpretation (VPI) and Variant Oncogenicity Interpretations (VOI) into a single VA type (Variant Pathogenicity Interpretation). Motivations for collapsing are based on both semantic and pragmatic considerations:
A proposal for a collapsed model is defined in the spreadsheet here, and reflects the following decisions/considerations:
We recommend the predicate set {pathogenic_for, likely_pathogenic_for, benign_for, likely_benign_for, uncertain_significance_for} - where 'pathogenic' is defined broadly enough to cover causal or contributing variant-disease relationships, to accommodate interpretations on Mendelian conditions and cancer, respectively. The context in which the predicate is used can inform the whether the variant is asserted to be causal vs contributing for the indicated condition: if the condition is a Mendelian, the implication is that the variant is causal; if the condition is a Cancer, the implication is that the variant is a contributing driver. One con here is that consumers in the cancer space might expect to see terms like 'oncogenic' - but our documentation can be clear that this is covered by 'pathogenic'. But this may be more a presentation-level issue that can be handled by UI software layer, and not a concern at the lower level of a data exchange schema.
The collapsed model includes the 'qualifier' fields we created for both oncogenic and pathogenic assertions (specifically, variantOriginQualifier and pathogenicMechanismQualifier). Documentation will guide users on when to apply each.
Our evidence and provenance model will need to support very broad types of evidence and different granularity of detail - from rich representation of ACMG-based evidence interpretation, to sparser representations that might accommodate interpretations where no formal guidelines are used. This will be a challenge, but one I think a SEPIO-based approach is equipped to handle. Even though different evidence frameworks/criteria are typically used to evaluate a variant in cancer vs Mendelian disease, there is overlap in the types of info used as evidence. And, as seen in ClinVar records such as this and this, guidelines like the ACMG used for evaluation against Mendelian conditions are in practice used to evaluate germline and somatic variants for cancer. So I think that even if we separated VPI form VOI, we would have to provide the same type of flexible evidence/provenance model.
A next step is to test the model against the diverse examples of ‘pathogenicity’ assertions, and decide if we are happy with how it handles things.
Below are some example records organized according to the different scenarios we encountered in our landscape review. Consider if our model supports/makes sense for each category, and we can dive deeper if needed to model out the actual examples.
I. Germline variant pathogenic for:
II. Somatic variant pathogenic for:
NOTE from Larry: The ClinVar RCVxxx examples referenced above are not truly reflective of the pathogenicity assertion. In ClinVar this would be more closely reflected in the SCVxxx records, but there is no url to access it directly. The RCVs are aggregations of 1 or more SCVs for the same variant-disease matches from multiple submitters. So, RCVs of 1 SCV look to be the same. Once there are more than 1 SCVs aggregated in an RCV you will note that the method to resolve the "discrepancies" is really the means to making this higher level aggregate assertion, not yet modeled precisely in the VA group.
Update: While we have tentatively decide to group Mendelian Disease and Cancer into a single VA type, it is not yet clear if we also want to lump Common disease in here as well.
We have not encountered variant interpretations for polygenic / common disease among our driving use cases, so we don’t have as deep an understanding of the semantics of these interpretations, and if/how they should be modeled here.
Proposal: For now, defer a decision on this issue. Define the Pathogenicity Interpretation VA type for Mendelian and Cancer. Note in our documentation that we do not yet explicitly support interpretations for common disease, but if the semantics of such an interpretation aligns with the model here, it can be use for this. If and when there is a demand for variant - common disease interpretations, we will do our due diligence and decide how to lump or split.
Initial notes on proposed scope and definition of these VA type, based on requirements and considerations documented here.
Definition: A statement about the contribution made (or lack thereof) by somatic variant to a specific type of cancer, wherein the variant is described along a spectrum from benign to pathogenic.
Scope Notes:
Comments: