ga4gh / va-spec

An information model for representing variant annotations.
14 stars 2 forks source link

Variant Oncogenicity Interpretation definition and scope #23

Open mbrush opened 5 years ago

mbrush commented 5 years ago

Initial notes on proposed scope and definition of these VA type, based on requirements and considerations documented here.

Definition: A statement about the contribution made (or lack thereof) by somatic variant to a specific type of cancer, wherein the variant is described along a spectrum from benign to pathogenic.

Scope Notes:

Comments:

mbrush commented 5 years ago

Issues/Questions:

  1. Subject: somatic variants - represent with qualifier as in Variant Pathogenicity? Any nuance to consider here besides single 'somatic' value?
  2. Descriptor: limit to the 'Cancer' subset of disease/genetic condition. Any nuance to consider beyond this? a. As for VPI, will need to consider modeling here - is a single ontology term sufficient? or need a more complex object model to build up / post-compose a Cancer description?
  3. Predicate: What is the set of relationships we want to make here? how granular?
  4. Qualifiers: will use this to specify allele origin (somatic), and possible to capture mechanism of pathogenesis (e.g. oncogene activation vs TSG inactivation)
  5. Evidence/Provenance: likely as complex as for germline VPI. Arpad/Dimitry to present CIViC models and planned ACMG-like guidelines to inform requirements here.
mbrush commented 5 years ago

Regarding the question of if/how to capture mechanism of pathogenicity (e.g. oncogene activation vs TSG inactivation) as part of this VA type, first we need to consider if this is even in scope for the primary statement here. It may be that this mechanistic aspect represents completely a different statement that we should create a separate VA type for.

If it is in scope here, we could do this using a qualifier with values like 'driver', 'modifier . . . or 'oncogene activation' and 'TSG inactiviation'. Alternatively, we could model this into the predicate, by defining a more granular set of relationships extending the basic ACMG-like ones. (e.g. is_oncogenic_driver_of).

DavidTamborero commented 5 years ago

I put some thoughts here in case I can not discuss with you online. As a disclaimer, remember that I have no expertise in developing data models, what I have is good experience in constructing genomic interpretation tools and also in interacting with users with different profiles/needs in both research and clinical setting. Putting my comments in that context, please see the following (and please apologies for any content that may be irrelevant at this point of your discussions)

javild commented 5 years ago
  • Regarding the last point, note that for some studies, to define the cancer type in which the particular effect has been evaluated is tricky (e.g. loose cancer type experimental models due to different reasons that I will not enumerate here). Therefore, you need to allow a ‘not speciifc cancer type’ or similar term meaning that this info can not be specified.

This discussion is equivalent to the one about leaving the "condition" field blank in Variant Pathogenicity type, am I right? https://github.com/ga4gh-gks/variant-annotation-model/issues/25

Regarding these other points:

  • another fundamental question is the level of strength for stating a given effect. [...] I would say that we need to include a field with the level of strength supporting the oncogenic/neutral effect.
  • I would like to see also a reference of the study(ies) in which the effect of the variant has been reported (e.g. pubmed id and –for the emerging ones-- a conference abstract)
  • a ‘other comments’ field is technically acceptable to be included, but I always think there is room for such a thing. For this variant model, this could include details of the level of evidence of the effect (e.g. if it is based in experimental data, to write some details about that experiment).

Sounds to me like they are all related with evidence and provenance. Definitely interesting to take into account. We'll handle them when we get to modelling evidence/provenance.

mbrush commented 5 years ago

Outcomes and Issues following 1-23-19 VA Call

Subject:

Descriptor:

Predicate:

Qualifiers:

Evidence:

mbrush commented 5 years ago

Given discussions and feedback on recent calls, we are exploring the idea of collapsing Variant Pathogenicity Interpretation (VPI) and Variant Oncogenicity Interpretations (VOI) into a single VA type (Variant Pathogenicity Interpretation). Motivations for collapsing are based on both semantic and pragmatic considerations:

A proposal for a collapsed model is defined in the spreadsheet here, and reflects the following decisions/considerations:

  1. We recommend the predicate set {pathogenic_for, likely_pathogenic_for, benign_for, likely_benign_for, uncertain_significance_for} - where 'pathogenic' is defined broadly enough to cover causal or contributing variant-disease relationships, to accommodate interpretations on Mendelian conditions and cancer, respectively. The context in which the predicate is used can inform the whether the variant is asserted to be causal vs contributing for the indicated condition: if the condition is a Mendelian, the implication is that the variant is causal; if the condition is a Cancer, the implication is that the variant is a contributing driver. One con here is that consumers in the cancer space might expect to see terms like 'oncogenic' - but our documentation can be clear that this is covered by 'pathogenic'. But this may be more a presentation-level issue that can be handled by UI software layer, and not a concern at the lower level of a data exchange schema.

  2. The collapsed model includes the 'qualifier' fields we created for both oncogenic and pathogenic assertions (specifically, variantOriginQualifier and pathogenicMechanismQualifier). Documentation will guide users on when to apply each.

  3. Our evidence and provenance model will need to support very broad types of evidence and different granularity of detail - from rich representation of ACMG-based evidence interpretation, to sparser representations that might accommodate interpretations where no formal guidelines are used. This will be a challenge, but one I think a SEPIO-based approach is equipped to handle. Even though different evidence frameworks/criteria are typically used to evaluate a variant in cancer vs Mendelian disease, there is overlap in the types of info used as evidence. And, as seen in ClinVar records such as this and this, guidelines like the ACMG used for evaluation against Mendelian conditions are in practice used to evaluate germline and somatic variants for cancer. So I think that even if we separated VPI form VOI, we would have to provide the same type of flexible evidence/provenance model.

A next step is to test the model against the diverse examples of ‘pathogenicity’ assertions, and decide if we are happy with how it handles things.

mbrush commented 5 years ago

Below are some example records organized according to the different scenarios we encountered in our landscape review. Consider if our model supports/makes sense for each category, and we can dive deeper if needed to model out the actual examples.

I. Germline variant pathogenic for:

II. Somatic variant pathogenic for:


NOTE from Larry: The ClinVar RCVxxx examples referenced above are not truly reflective of the pathogenicity assertion. In ClinVar this would be more closely reflected in the SCVxxx records, but there is no url to access it directly. The RCVs are aggregations of 1 or more SCVs for the same variant-disease matches from multiple submitters. So, RCVs of 1 SCV look to be the same. Once there are more than 1 SCVs aggregated in an RCV you will note that the method to resolve the "discrepancies" is really the means to making this higher level aggregate assertion, not yet modeled precisely in the VA group.

mbrush commented 4 years ago

Update: While we have tentatively decide to group Mendelian Disease and Cancer into a single VA type, it is not yet clear if we also want to lump Common disease in here as well.

We have not encountered variant interpretations for polygenic / common disease among our driving use cases, so we don’t have as deep an understanding of the semantics of these interpretations, and if/how they should be modeled here.

Proposal: For now, defer a decision on this issue. Define the Pathogenicity Interpretation VA type for Mendelian and Cancer. Note in our documentation that we do not yet explicitly support interpretations for common disease, but if the semantics of such an interpretation aligns with the model here, it can be use for this. If and when there is a demand for variant - common disease interpretations, we will do our due diligence and decide how to lump or split.