ga4gh / va-spec

An information model for representing variant annotations.
Apache License 2.0
17 stars 4 forks source link

Variant origin terms #26

Open mbrush opened 5 years ago

mbrush commented 5 years ago

As discussed in issue #22 (Variant Pathogenicity Interpretation Definition and Scope), we decided that variant origin will be captured as a qualifier, and that this value should be constrained to germline for this VA type. Similarly, the value will be constrained to 'somatic' for the Variant Oncogenicity VA type (#23). There are a number of other variant origin terms that may be relevant in other contexts - e.g. ClinVar provides a large set of terms used to describe the origin of a variant in particular observations made in patients:

 germline, de novo, somatic, maternal, paternal, inherited, unknown,  uniparental, biparental. 
 Note that biparental and uniparental are intended for the context of uniparental disomy.

In this context, the intent is not to 'qualify' or constrain the meaning of the assertion made in a variant interpretation, but rather to capture information about the provenance of variants about which data was collected to provide evidence for the assertion). e.g. the variant may have been de novo, or maternal in a patient - but the final assertions isn't meant to state that only de novo or maternal variants are pathogenic for the condition.

For both of these use cases, we will want an ontology/terminology to provide allele/variant origin terms for our model. ClinGen and Monarch use the GENO ontology here. We should evaluate its 'allele origin' hierarchy here for its generally 'correctness', and more specifically for its utility for our use cases. GENO is open to refining or extending its terms if we have specific feedback or requests for them.

mbaudis commented 5 years ago

This (GENO term use for allele origin) seems like one of those cross-cutting standard + format documentation topics we've started schemablocks.org for. I.e., coordinate between GKS && CP, and then write this as recommendation & document in SchemaBlocks.

mbrush commented 5 years ago

The variant origin terms in CIViC curation interface - include somatic mutation, germline mutation, and germline polymorphism - what is distinction/significance of the germline mutation vs polymorphism? Do we need to accommodate this in our model?

@ahwagner @arpaddanos please comment if you can clarify this.

ahwagner commented 5 years ago

In general, a polymorphism is distinguished from a mutation by being an established allele in a population. In CIViC, we set the threshold of 1%, with germline variants observed at or over this threshold in the study-relevant population recorded as polymorphisms, otherwise mutations.

As to whether or not to model this, I think the designation of polymorphism vs mutation is a useful curated annotation that should be preserved in the variant record somewhere, though it doesn't need to be modeled differently.