clingen-data-model / clingen-interpretation

Allele (variant) interpretation model and API for ClinGen
3 stars 1 forks source link

allele vs variant #1

Closed cbizon closed 3 years ago

cbizon commented 7 years ago

We have an allele model and an allele registry, but in the interpretation model everthing is called "variant". VariantInterpretation, all the attributes say variant, etc, even though the thing that they all refer to is an allele.

To remind everyone, the reason we preferred allele over variant is that variant implies a difference from a reference, while an allele is more general - we might want to make statements (such as interpretations or frequencies) about the reference allele, which we might not consider a "variant".

The tension of course is that pretty much everybody else in the world calls it a variant.

I see three options for the way forward: 1) Do what we are doing now where the interpretation model says variant, references alleles, leave the allele model the same, and have two names for the same thing. 2) Admit defeat, rename allele as variant, probably the next time we go back to the variant nee allele model. 3) Stay the course, bring the interpretation model in line with the allele model and call everything allele.

I'm ok with whatever we come up with (though I have my own opinions) but I want for the group to have an explicit decision on this, and I'd like to do it before we generate even more documentation that will have to be updated one way or another. The next meeting we could discuss it on would be Wednesday, and that's kind of far off, so if people are willing to have the discussion via email (or an impromptu call), that would be better, IMO.

cbizon commented 7 years ago

Comment from @bpow

I would argue for 2 or 3, from the standpoint that we should reserve splitting of terms for situations when those terms will really mean two different things. I had been persuaded by the arguments of using allele as having a connotation of being more inclusive of the possibility of representing the reference allele. However, I find it helpful, when providing terms, to look into what their definition is in the more generic context of the English language.

en.oxforddictionaries.com defines 'variant' as:

A form or version of something that differs in some respect from other forms of the same thing or from a standard.

So, while there is some connotation of differing 'from a standard', any version that differs in some respect from other forms of the same thing (the reference sequence differs from alternative sequence) can be properly termed a 'variant' form... It is not incorrect, by this definition, to refer to the 'reference variant'.

Furthermore, the historical context of 'allele' is such that it is frequently defined (and I think originally defined) as one of two or more variant forms of a gene, such that describing intergenic 'alleles' is not necessarily accurate.

So I guess I would be OK with option #2. But I don't feel super-strongly about it.

cbizon commented 7 years ago

I agree with @bpow that variant is not indefensible but the point that it still leaves kind of funny is e.g. the base at a site where no variation has ever been observed. There's no reason we couldn't make statements about such a base (like this base is strongly conserved), but it's not 'variant'. I think you could philosophically defend it by saying that there is at least notional variation: we could imagine that at the site there is the possibility of variation, even if we have not observed it.

larrybabb commented 7 years ago

The VMC (variant modeling collaboration) group also deals with this. The group has Variant in the name but the primary (core) class is Allele which can be used to create a Haplotype which can be used to create a Genotype/Diplotype. We discussed these same concerns with no significant points to add and determined that they are all referred to as Variants. Of course, some wanted to call them Sequences, but that got beat down (even thought HL7/FHIR seems sort of stuck on that).

Not adding much sway in any direction. I think #1 will hold the course for the next few months at least. I'd guess that we may change it all to Variant, particularly when we get into demonstrating the registration of Genotypes/Diplotypes. If we somehow figure out how to have a single registry (and namespace) for alleles and genotypes, then Variant would be a reasonable overarching term.

rrfreimuth commented 7 years ago

As Larry said, this has been debated extensively within the VMC. I'd like to try to adopt VMC's definitions/nomenclature when it is finalized, and before that occurs groups like ours will have an opportunity to provide feedback.

Both terms - allele and variant - can be ambiguous. Allele can mean either a "small" region within a larger locus (e.g., gene) but it can also be used in classical genetics to refer to the whole inherited unit (e.g., genetic locus). Variant can be squishy because it can refer to a variant site (location) or variant form (specific change relative to another, a la en.oxforddictionaries.com). Personally, I like the idea of being able to refer to an arbitrary chunk of sequence as an allele, regardless of whether or not an alternative sequence form has been identified (yet).

I'm not sure that I'm adding anything to the discussion, either, except that I think we need to come back to this and make a call after the VMC work is a bit more baked.

larrybabb commented 7 years ago

We have a "concern" related to how to handle ContextualAllele data. Since we do not have a registry to point to, and since not everyone will be "required" to use Baylor's Allele Registry, we must supply the essential information for representing the reference sequence, start-end position and sequence that occurs at that position (ALT). Like @rrfreimuth I think we could use the VMC structure (it really is pretty basic anyway and jibes with our essential representation from Allele). It should not be too difficult and will show some good will towards the VMC effort.