ga4gh / va-spec

An information model for representing variant annotations.
Apache License 2.0
17 stars 4 forks source link

Review new terms for Allele Origin value set #64

Open mbrush opened 4 years ago

mbrush commented 4 years ago

@larrybabb, @cbizon, @ahwagner please review proposed terms to add to GENO to support ClinVar allele origin terms: https://github.com/monarch-initiative/GENO-ontology/issues/43#issuecomment-611115928

mbrush commented 4 years ago

responding to Larry's last comment from the thread in GENO repository (see https://github.com/monarch-initiative/GENO-ontology/issues/43# for context):

That works. To be clear, I'm fine with a shorter list for the value set. The list doesn't have to be ClinVar's values. I was simply offering examples of what folks are sharing in practice.

I think it makes sense to include all of these in the VA value set. Nearly all were already in GENO already, and ClinVar is a community authority that (I assume) has proven out the utility of these terms. So will include them all unless we think some will cause confusion (e.g. the bi / uniparental terms, or inherited vs germline). I will update the GENO hierarchy and term labels slightly to address these concerns, and present next call for a quick (5 min or less) resolution on this one.

mbrush commented 4 years ago

Updated from April 29 VA Call:

Proposal:

larrybabb commented 4 years ago

To be clear there's a difference between allele origin and classification context even thought the terms overlap. When I asked Heidi Rehm to define "classification context" she replied with

a field that allows the user to define distinct contexts for an classification that are commonly used in clinical testing and have materially different clinical scenarios (e.g. constitutional testing, tumor testing). The major contexts currently used are germline are somatic.

When I asked about whether "Synthetic" should be included in the list, the reply was that "synthentic" classifications don't exist in a clinical scenarios, but they would likely be evidence statements (i.e. functional impact, etc..) that would be lower level data points that would support a clinical scenarios classifcation.

From my perspective it seems like we are trying to determine the general context or source of the variant that is the subject of the classification/assertion. If we are defining a value set for the strict use within a VarPath Assertion then germline, somatic and unknown (always useful) should be the constrained list. If we expect to use this value set for other variant context annotations in other statements then we could consider adding "synthetic" and making a note that it would never be used in a clinical scenarios for var pathogenicity assertions.