ga4gh / va-spec

An information model for representing variant annotations.
Apache License 2.0
16 stars 3 forks source link

Lump or split Variant Pathogenicity and Oncogenicity Statements #69

Open mbrush opened 3 years ago

mbrush commented 3 years ago

Starting a new ticket specifically dedicated to the question of whether to lump or split representation of Variant Pathogenicity and Variant Oncogenicity Statement types.

This discussion was started in issue #23 ("Variant Oncogenicity Interpretation definition and scope"), which presents a rich discussion and examples that should be explored as background. Our initial decision here to split these statement types was ultimately reversed, based on semantic and pragmatic considerations (see comment here, and examples here). Ticket #29 ("Clarify the semantics of Oncogenicity vs Pathogenicity") also contains very relevant and informative discussion.

The main argument for collapsing these to one Statement type is the existence of data in ClinVar that would be hard to classify if we had to choose between Pathogenicity or Oncogenicity Statements. By collapsing, we avoided having to deal with this issue. (FWIW I do see benefits of splitting, and think there are ways to mitigate any 'ClinVar data messiness' issue if we go this route.)

Recent considerations raised by members of the cancer community argue for moving back to splitting these Statement types.
@ahwagner, @Dmitriy, others - please review issues #23 and #29, and weigh in with your recommendations.

ahwagner commented 3 years ago

Regarding #23, it appears that the decision was made on the need to 1) simplify semantics and 2) avoid complexity. However, I think that the VICC (or at least, VICC knowledgebase representatives that have commented on this) prefer to make the semantic distinction between predisposing and oncogenic evidence / assertions. We also feel that using variant origin to infer this distinction overloads this attribute. @malachig summarized this very nicely on a Slack thread:

I think this makes sense if the only two curation types are pathogenic for germline variants or pathogenic for somatic variants (oncogenic).

BUT, you still need some way to known which of these two is being curated in each evidence statement. If you have only those two type then using the Variant Origin (Germline vs Somatic) to distinguish the two categories of Pathogenic is tidy. We need some way to distinguish between the two types because, among other things, we are going to apply two different guidelines (ACMG vs Oncogenicity SOP) and their associated criteria and this will impact the way the interface behaves, documentation shown, supporting information pulled in from external databases, etc.

But, my original concerns stand. That if you want to do these two curation types alongside: Predictive, Predisposing, Diagnostic and Functional evidence. Then I would prefer NOT to use Variant Origin as a way to distinguish types of evidence. Because Variant Origin has different meanings in the context of the other evidence types. Germline (rare and common) and Somatic variants can have Prognostic meaning. Predictive evidence is interpreted differently for Germline (adverse response) and Somatic (sensitivity/response). Both Germline and Somatic variants can have Diagnostic significance (e.g. used to define cancer sub-types). Arguably for Functional evidence Variant Origin may always be 'N/A' (edited).

From a UI perspective I like the idea of selecting your Evidence Type first and having a different Evidence Type for each scenario where the behavior of the interface will substantially change. I would rather not have the situation, that sometimes these changes are triggered by the selection of Evidence Type and other times by the selection of Variant Origin. Because that is less intuitive for curators who are working across the breadth of evidence types reflecting the areas of potential clinical relevance.

Ultimately it is difficult to choose a data model that balances supporting all these use cases, is simple and avoids all confusion among users of all relevant domain areas (oncologists, geneticists, lab directors, pathologists, cancer biologists, etc.), and avoids use of terms with particular widely accepted but sometimes conflicting meanings to these groups. E.g. the word “Pathogenic” evokes visceral support and opposition when applied to somatic variants. Creating a standard that allows representation of concepts across resources that are tackling different scopes (which in turn influences their data models) without loss of meaning is going to be tough. My understanding of the proposal Alex describes is that use of the two sub-classes here would not be required, but the option to do so would facilitate integration with several existing resources. In any case, it’s a fascinating problem and discussion.

ahwagner commented 3 years ago

Proposal:

I think that a good way to clarify these concerns is through inheritance, where variant_predisposing_statement and variant_oncogenicity_statement classes are subclasses of variant_pathogenicity_statement:

Variation Profile Models - Page 5

In this way, resources such as CIViC, OncoKB, and CGI can articulate the specific semantics of their evidence and more readily reuse them without inference. This comes at no cost to computability of records from ClinVar or other representations that instead use the pathogenicity class, as each subclass is_a variant_pathogenicity_statement.

dsonkin commented 3 years ago

Predictive, Diagnostic and Functional evidence are separate independent statements, which do not require additional oncogenicity statement or predisposing statement. (UI should not effect how data is modeled.) Predisposing term is not used in ACMG/AMP germline guidelines, instead pathogenicity term is used in combination with name of associated disease, such construct allows pathogenicity term being used for any disease. Based on above suggestion all ClinGen approved expert panel pathogenicity entries will go into predisposing statements, instead the obvious logical choice of pathogenicity statements. This issue goes far beyond of ClinVar or other resources entries. It's about modeling underlying biological concept and using terms accepted by community. It took years in germline world to get clear majority consensus for using pathogenicity term. There is no reason to invent new predisposing term and jeopardise years of hard work.

ahwagner commented 3 years ago

I disagree that the above proposal jeopardizes the community vocabulary, and I actually think it helps the community reconcile confusion that already exists around these terms.

I agree that the use of predisposing doesn't appear at all in the ACMG/AMP guidelines for germline mendelian disorders. However, it is disingenuous to claim this concept is being "invented" here; it is an oft-used term by the somatic community. As you must know, the AMP/ASCO/CAP guidelines for the interpretation of variants in cancer mentions predisposing and predisposition several times, including:

If germline testing is ordered for cancer predisposition genes, reporting of germline variants should follow the ACMG/AMP guidelines.

The specific use of predisposition here is used to contrast these genes with oncogenic genes, for the purpose of evaluating germline variants under the ACMG/AMP pathogenicity guidelines. My above proposal does not advocate against the use of pathogenicity, but clarifies that a predisposing statement _isa pathogenicity statement. And yes, if ClinGen somatic curates a predisposing statement under the ACMG guidelines in CIViC, that would mean that they are–by definition–curating a pathogenicity statement. And when they curate an oncogenicity statement they are also curating a pathogenicity statement.

We cannot ignore the vocabulary of some communities in favor of others without good cause. These are different standards with different purposes, developed by a substantially overlapping section of the clinical genomics community.

We should also be very clear about our role as a GA4GH Driver Project: our work is to drive standard development that advances our ability to exchange genomic and health data. This means representing the contributed content of the VICC knowledgebases, and in the context of VA, predisposing and oncogenic evidence. CIViC, OncoKB, and CGI each have oncogenic evidence. CIViC also has predisposing. These are all interoperable and computable with pathogenic evidence under the model I proposed.

malachig commented 3 years ago

I don't think the above is really about the use of the word "predisposing" instead of "pathogenic". It is about the proposed use of a Variant Origin qualifier to distinguish Variant Pathogenicity and Variant Oncogenicity statements.

Variant Origin is used in other evidence types which is why they are being brought up here. Note, the proposal above clearly states these are all pathogenicity statements. All ClinGen approved expert entries would be pathogenicity statements. All oncogenic statements would be pathogenicity statements.

This proposal would allow the separation of Pathogenic and Oncogenic statements without relying on a recorded germline/somatic qualifier. Every germline variant interpretation is not automatically Pathogenic (ACMG relevant). Every somatic variant interpretation is not automatically Pathogenic (Oncogenic).

Now, we could just have one evidence type in CIViC (Pathogenic) and based on the indication of Somatic vs. Germline Variant Origin that would dictate which evidence codes/guidelines are applied, which supporting information is displayed, etc. And we have considered this. But those same Variant Origin concepts are used in other evidence types. So this is about the data model, not just the UI (though I would argue that UI design is inseparable from how we understand biological concepts and thus is entirely relevant to how one should model them). On balance it seemed more intuitive to us, to have two flavors of pathogenic evidence so that we don't have to use Variant Origin as the means to distinguish them. This way the concept of Variant Origin which is itself a complex concept with different nuances for each Evidence Type can remain more independent. That is all.