ga4gh / va-spec

An information model for representing variant annotations.
Apache License 2.0
17 stars 4 forks source link

Variant Pathogenicity Interpretation definition and scope #22

Open mbrush opened 5 years ago

mbrush commented 5 years ago

Initial notes on proposed scope and definition of these VA type, based on requirements and considerations documented here.

Definition: A statement about a causal association (or lack thereof) between a germline variant and a Mendelian genetic condition, wherein the variant is described along a spectrum from benign to pathogenic for that condition.

Scope Notes (initial assumptions, subject to change):

Comments:

larrybabb commented 5 years ago

I'm not sure how/if we need to specify that the statements can and are commonly stated with no specific condition when they are benign, likely benign and often uncertain significance. My understanding for why this is done is that the interpreter has evidence that is compelling enough that if it is benign, it is benign for all mendelian conditions, so to specify conditions would be problematic. It is acceptable to state conditions in "uncertain signifiicance" (VUS) calls but not required. I've watched this evolve in ClinGen to the point where it seems that the interpreters tend to prefer only specifying the mendelian condition when the variant IS clinically significant (aka likely path or path) at which point it is critical to specify the precise condition that the variant causes.

gaberudy commented 5 years ago

Note for tumors the term that seems to being used in "oncogenicity" vs "tumorigenicity" :)

mbrush commented 5 years ago

From Dec 19 and Jan 9 VA Calls:

Outcomes:

  1. Subject: We will constrain the scope of these statement to be about germline variants only.

  2. Descriptor: We will also constrain the scope of these statements to be about Mendelian Conditions with single causal gene. See questions below about whether we should require a condition be provided.

  3. Predicate: indicates the nature of causality between the variant and the condition along spectrum from definitively benign (does not cause) to definitively pathogenic causes), and can include VUS if evidence not conclusive. Causality here means that the variant alone is sufficient to cause the condition (as opposed to it being one of many contributing, required factors). Initially we will not constrain the predicate value to a single code set - e.g. 5 ACMG codes. We will let any codes go in, and require reporting the guideline/method used (ideally in a standardized way) so consumers can look to this guideline and understand what the code means.

  4. Qualifier(s): We will constrain the variant subject to be of germline origin, but it is not practical to encode this constraint at the subject slot - because we will not define a 'germline variant' type. This is because variant origin is something that should be captured in the context of a particular statement made about the variant, not an intrinsic property of the variant itself (given that variants in our model are abstract notions of a variant, not physical ones). Accordingly, a variant's 'origin' should not be represented as an attribute of a variant (e.g. origin: "germline"), or a type of variant (e.g. type: "Germline Variant").

  5. The evidence and provenance model supporting these types of annotations will have to be rich - and be able to represent the interpretation and reasoning tasks captured in the ACMG Guidelines. We will start by assessing the SEPIO ClinGen ACMG profile.

Questions:

  1. Is the affected gene/transcript an important qualifier for this Statement type ? e.g. that the asserted pathogenicity of ClinVar variant 17864 results from its affect on the APOE gene (and not the TOMM40 gene that it alto affects)? Consider how inclusion of the qualifier would refine the meaning of the statement made in the annotation, from "ClinVar variant 17864 is pathogenic for Alzheimer's Disease" to "ClinVar variant 17864 is pathogenic for Alzheimer's Disease through its effect on the APOE gene." Is the latter something that creators of these annotations intend to state explicitly? And if so, would they perhaps use the protein-level variant as the subject of the annotation?

  2. Particularly when the outcome is 'benign', it is common for no condition to be specified. Do we allow this field to be left blank? Or require some default generic 'Disease' term as the descriptor? Or an 'unspecified disease' term?

Action Item: @larrybabb will consult with various efforts and see what scenarios result in a blank or unspecified condition, and consider what it could mean to find a blank filed here. Then we can decide how to define the modeling constraint here.

  1. If non-mendelian genetic conditions (e.g. common disease, multifactorial disease) are out of scope, do we need to have a VA type that accommodates these? or can these be out of scope for now?

  2. Modeling 'Mendelian Condition': this is a domain entity that fills the descriptor slot. TO DO: make ticket.

larrybabb commented 5 years ago
  1. We will constrain the scope of these statement to be about germline variants only.

ClinVar does require the submitters to capture the "allele origin" of the variant...

Allele Origin (from ClinVar submission form) Required. The genetic origin of  the variant for individuals in each aggregate observation. Allowed values: germline, de novo, somatic, maternal, paternal, inherited, unknown,  uniparental, biparental. Note that biparental and uniparental are intended for the context of uniparental disomy. If you'd like to indicate zygosity, please report counts of homozygotes and heterozygotes in columns BV-BY. For de novo variants, please indicate "de novo", not the origin of the chromosome.

It would be preferable if these terms were based in an ontology (I think).

Also, do we need to qualify the organism too? ClinVar presumes to be all Human. Or do we infer that from the reference sequences that are the basis of the variant call?

These type of variant qualifiers (and other structural and organizational ones) fall outside the VMC spec, but are needed to define the genomic variant concepts needed for the subjects of these annotations.

larrybabb commented 5 years ago

Questions:

  1. Is the affected gene/transcript an important qualifier for this Statement type ? e.g. that the asserted pathogenicity of ClinVar variant 17864 results from its affect on the APOE gene (and not the TOMM40 gene that it alto affects)? Consider how inclusion of the qualifier would refine the meaning of the statement made in the annotation, from "ClinVar variant 17864 is pathogenic for Alzheimer's Disease" to "ClinVar variant 17864 is pathogenic for Alzheimer's Disease through its effect on the APOE gene." Is the latter something that creators of these annotations intend to state explicitly? And if so, would they perhaps use the protein-level variant as the subject of the annotation?

I've been pushing on this question with the domain experts that create these interpretations. I hear very consistent feedback that they assessing the genomic DNA change or variant that is the root cause of any derived transcript and/or amino acid change.

The subject of these interpretations are most notably the genomic DNA change, not the RNA or AA change. While those downstream impact(s) are directly related to the cause, the interpretation is an assessment of the genomic dna change.

If we need to bring in some experts to verify this on the call please let me know.

larrybabb commented 5 years ago

2. Particularly when the outcome is 'benign', it is common for no condition to be specified. Do we allow this field to be left blank? Or require some default generic 'Disease' term as the descriptor? Or an 'unspecified disease' term?

Action Item: @larrybabb will consult with various efforts and see what scenarios result in a blank or unspecified condition, and consider what it could mean to find a blank filed here. Then we can decide how to define the modeling constraint here.

Please let me know when you want to have this discussion. I will try to invite the appropriate folks for it. This has been a longstanding struggle for the ClinVar submitters and experts that form the guidance. They have bounced around from trying to create generic disease terms or "no specific disease" terms to going with blank for any mendelian disease. The issue that ends up presenting itself is that by specifying a disease for Benign/Likely Ben/VUS you would need to assess every disease that a given gene has an association to when a variant possibly impacts it's function. The list of gene to phenotype associations changes over time thus making it a challenge. However, there is some movement now to be precise and specify a disease on every interpretation. But there will be many legacy and non-compliant folks going forward that create interps that don't specify a disease, how do we deal with that? And, the disease and phenotype ontologies are not standardized enough to allow folks to effectively assert on nodes at different points in the ontology to deal with disease groups or sets of diseases that are closely related for more generalized assessment in a standardized way. just some of the challenges of making this perfect.

mbrush commented 5 years ago

Final Outcomes/Decisions (from Jan 16 call):

  1. variant origin qualifier - limit values allowed here to 'germline' only. Other allele origin terms (e.g. maternal, paternal, de novo, etc.) may be relevant for describing provenance information (e.g. the origin of a variant in an individual patients in which it was observed, where it provides evidence for the final interpretation). See #26 to review utility/accuracy of GENO allele origin terms for this purpose.
  2. Capturing taxon of variant - agreed that this should be a VR thing, as it is tied to the variant itself. it is an intrinsic and definitional feature of a variant, which could be explicitly described in a variant model, or implicit in the taxon of the reference sequence the variant is defined on.
  3. The affected gene/transcript not a part of the statement in this VA type.

At this point we have settled enough to wrap iteration 1 of the core variant pathogenicity statement model. We will triage the following unresolved issues into separate tickets, for further consideration we we return to formalize the schema for this VA type.

Related Tickets:

  1. How to capturing unspecified conditions in Variant Pathogenicity Interpretations - see #25.
  2. Modeling 'Mendelian Conditions' as relevant domain entity - see #24.
  3. Review 'allele origin' hierarchy in GENO for its accuracy/utility for our use case(s) - see #26.
mbrush commented 5 years ago

Consider if pathogenic mechanism could be optional qualifier on a VPI as well . . . not often collected, but could be

mbrush commented 5 years ago

Update - we are exploring the idea of collapsing VPI and VOI annotations into a single VA type - see proposal here.

mbrush commented 5 years ago

Note that Table 2 in the ENIGMA vocab paper has nice comparison between ACMG and IARC categories.

mbrush commented 4 years ago

Re:

Consider if pathogenic mechanism could be optional qualifier on a VPI as well . . . not often collected, but could be.

This was a proposed requirement for Oncogencity annotations, that may now apply to Variant Pathogencity as we expanded the scope of this VA type to cover oncogencity of cancer-causing variants. It is not clear this is needed, and even less clear what the possible values would be (thinks like loss-of-function, or 'gain-of-function', or 'dominant-negative'). Propose for v0 to leave the 'pathogenicMechanismQualifier' on the VPI statement type, but do not provide a value set. If it is used during testing, we can see what types of values are entered here. It may be however that this info would be captured in a separate FunctionalImpact annotation on the variant that could be packaged together withthe VPI annotation.

ahwagner commented 4 years ago

Here are a list of attributes used by CIViC for functional "clinical significance":

Importantly, this is a separate, uncoupled statement type from "predisposing" evidence, which contain the familiar values for pathogenicity.

OncoKB has a variant oncogenicity model, which includes a "mutation effect" attribute, with the following values:

mbrush commented 4 years ago

Thanks Alex. I think the idea behind the pathogenicMechanismQualifier is to support your second OncoKB use case - i.e. give people the option to go beyond asserting that "Variant X is pathogenic/oncogenic for Disease Y", but to say that that "Variant X is pathogenic/oncogenic for Disease Y through a gain-of-function mutation effect".

@ahwagner if this seems consistent with what OncoKB oncogencity annotations assert as true, we should explore if/how our proposed model might support this. Other than that, I am not aware of other use cases for the pathogenicMechanismQualifier.

ahwagner commented 4 years ago

This is not quite the assertion made by OncoKB, as OncoKB treats oncogenicity as a subtype-agnostic statement, so instead you have:

"Variant X is pathogenic/oncogenic ~for Disease Y~ through a gain-of-function mutation effect".

Whereas in CIViC you have:

"Variant X is ~pathogenic/oncogenic~ for Disease Y ~through~ a gain-of-function mutation effect".

However, it is suggested in documentation that there will be CIViC Assertions that contain CIViC Evidence of both functional effect and pathogenicity (predisposition), and thus would rely on the pathogenicMechanismQualifier for assertions of this type (mirroring use cases in ClinVar, BRCA Exchange, and similar). I would therefore advocate that we use the set of CIViC terms already described above, as this is a superset of example terms provided in an earlier comment.

mbrush commented 4 years ago

Thanks Alex. Can you clarify what the CIViC assertion is . . . not making sense to me when I read the words that are not crossed out.

ahwagner commented 4 years ago

No worries, let me reframe.

CIViC has the notion of Assertion objects, which are clinical significance classifications based upon professional society guidelines. CIViC Assertions are based upon a collection of CIViC Evidence objects, each of which represents a single statement drawn from a single reference.

Both Assertions and Evidence relate to the clinical impact of a variant in a particular cancer type, and can be of one of five types:

The Assertions of Predisposing type are based on the ACMG/AMP guidelines for the interpretation of sequence variants in Mendelian disorders, the same guidelines used for the pathogenicity interpretations used by ClinGen assertions in ClinVar and elsewhere. Assertions typically contain several Evidence items, which are not restricted to a matching type. Consequently, you may have Assertions containing some Evidence of Predisposing type, and some Evidence of Functional type. Under the VA model proposed here, such a CIViC Assertion would use the pathogenicMechanismQualifier to hold the Clinical Significance attribute from the supporting Evidence of Functional type. The Clinical Significance attribute can be one of the following:

Since the values for this field have been exhaustively debated in CIViC and are a superset of the terms indicated earlier in this thread, I propose we adopt this limited set of terms to be the valueset for pathogenicMechanismQualifier.

mbrush commented 4 years ago

Thanks Alex. This is wonderful - confirms what i thought I knew, and extends it with some things i did not.

My specific question above was regarding the assertion you said CIViC is making:

"Variant X is ~pathogenic/oncogenic~ for Disease Y ~through~ a gain-of-function mutation effect".

Removing the crossed out words, I read: "Variant X is for Disease Y a gain-of-function mutation effect". . . suggesting that the functional impact assertions are tied to a specific disease. Is this the case?

I had thought that these functional impact statements are independent of disease context, and would read more simply as something like "Variant X is a gain-of-function mutation effect".

ahwagner commented 4 years ago

Removing the crossed out words, I read: "Variant X is for Disease Y a gain-of-function mutation effect". . . suggesting that the functional impact assertions are tied to a specific disease. Is this the case?

Yes, functional impact Evidence in CIViC are tied to disease. FWIW, I disagree with that and think statements of this type should be as you described: "Variant X is a gain-of-function mutation effect".

I think that this VA issue is an opening to revisit this part of the CIViC model, and I have opened a ticket to review and discuss this week. If there is a different meaning or intent of functional evidence that necessitates this component (as well as the variant origin component, which likewise–in my opinion–should not matter), we will state that rationale in documentation and I will bring that back to this thread. If you could hold off on finalizing this until that discussion happens (as early as Thursday), that would be helpful.

mbrush commented 4 years ago

Sure Alex, and thanks for bringing this back to CIViC for discussion. I think that the functional assertions in CIViC will be modeled using the VA 'Experimental Functional Impact Statement' type (#34) - which is not included in our v0 release. So we have time to get this sorted out.

Re:

you may have Assertions containing some Evidence of Predisposing type, and some Evidence of Functional type. Under the VA model proposed here, such a CIViC Assertion would use the pathogenicMechanismQualifier to hold the Clinical Significance attribute from the supporting Evidence of Functional type.

This is interesting and new to me. This means that if a CIViC predisposition assertion for Variant X toward Disease Y has functional evidence saying that the Variant X is a gain-of-function in Disease Y, then the predisposition assertion would state that "Variant X is pathogenic for Disease Y through a gain-of-function effect".

This is slightly different than what OncoKB does, where the pathogenicity assertion is not tied to a specific disease (e.g. "Variant X is pathogenic through a gain-of-function effect") . . . the question here is whether we would use the Variant Pathogenicity Statement type to represent this information (I think yes, because we earlier agreed that these pathogenicity statements can be made agnostic to a specific disease/condition).

mbrush commented 4 years ago

Realizing also that we never came to a consensus about whether a descriptor indicating the Genetic Condition is required for this Statement type - and if so, what term(s) to use when this slot is blank or 'not specified'.

The record of where we left off is in the meeting minutes here . . . with a proposal for a single 'not specified' code when the Condition in a data set is blank or explicitly indicated to be 'not specified', 'not provided', or 'unknown'. Documentation can describe how to interpret a 'not specified' value in VA data. Seemed to be broad but not universal agreement that this was a good place to start for v0 (although some people still felt it better to have a blank slot here)

ahwagner commented 4 years ago

Thanks for clarifying Matt. You're right, CIViC Functional Evidence belongs in https://github.com/ga4gh/va-spec/issues/34, and I've added a comment there re: above discussion. CIViC Predisposing Assertions do belong here, and (sometimes) will contain CIViC Functional Evidence.

I have no strong preference for the value(s) for describing unspecified Genetic Condition. I think a specified condition should be required only for assertions of pathogenicity, and optional otherwise.