ga4gh / vrs

Extensible specification for representing and uniquely identifying biological sequence variation
https://vrs.ga4gh.org
Apache License 2.0
80 stars 32 forks source link

Categorical variation #59

Closed ahwagner closed 7 months ago

ahwagner commented 5 years ago

Some variants are described in terms of exculsionary criteria. These will need to be considered in our model.

See https://civicdb.org/events/genes/5/summary/variants/2408/summary#variant for an example.

rrfreimuth commented 4 years ago

Would it be more accurate to say "some sets or groups of variants are defined, at least in part, in terms of exclusionary criteria"?

That said, the example is essentially "lack of a variant at a given position", which when stated in the positive, is "presence of reference sequence at a given position". If we can express the reference sequence as an Allele, then a set of variants could be defined using that as an inclusion criterion rather than the negative form as an exclusionary criterion.

larrybabb commented 4 years ago

wouldn't Non-V600 be all variants at V600 that are not the same as reference? if so, isn't that what those nasty ambiguity codes are for in the IUPAC list? I didn't look but I presume they have the ambiguity codes for all the amino acid residue combos, like they do for nucleotides. If not, then we would have to do something special to support this.

Likely it will get thown into the categorical variation bucket. (Maybe?)

reece commented 4 years ago

I think there are two separable issues here.

1) I agree with @rrfreimuth that asserting reference is approximately the same thing as negating the existence of wildcard variation. Asserting reference is preferable.

2) @larrybabb: This issue is not about other AA at p.600. Instead, it's about variation at other locations in the context of a reference V600V (i.e., ref). For example, the statement we'd like is something like V600V and K601E.

So, IMO, this is just another flavor of co-occurring variation.

ahwagner commented 4 years ago

@rrfreimuth and @reece, you've got it. This variant is about variations occurring not at V600, effectively the notion of a non-reference presentation of the BRAF gene (in entirety). From the first evidence item description, it is clear that the additional condition of reference p.600 (V600V) is included in the definition of this variant.

The challenge isn't with the co-occuring variation component (though I agree it's a component as we're asserting reference at p.600 and an altered state elsewhere). Instead, the challenge is how we describe (a) a "non-reference" / negative state for the full protein sequence, plus (b) a co-occurring reference state at V600.

This issue should start with how we resolve (a).

@larrybabb this the ambiguity codes only apply for nucleotides. Due to the alphabet size for amino acids, it is unfeasible to specify one-character ambiguity codes.

github-actions[bot] commented 4 years ago

This issue was marked stale due to inactivity.

reece commented 4 years ago

"Negative Variants" is opaque. Can we call this issue something else? How about "Non-specific variation"?

ahwagner commented 4 years ago

Okay. I gave it some thought, looked through the set of most common biomarkers in CIViC, and have decided that this issue is primarily about a form of categorical variation. I prefer describing these as aggregative or categorical vs non-specific, since the criteria are well-defined and exact. "Non-specific" variants can mean many things, including variants with fuzzy intervals, or insertions / deletions of approximate size and/or unknown sequence.

Above we discussed the V600V + non-V600 alteration scenario (position exclusionary), but I'd also like us to consider here the position-bound non-reference variants, such as BRAF V600, PIK3CA E545, and DNMT3A R882.

github-actions[bot] commented 4 years ago

This issue was marked stale due to inactivity.

github-actions[bot] commented 3 years ago

This issue was marked stale due to inactivity.

github-actions[bot] commented 3 years ago

This issue was marked stale due to inactivity.

ahwagner commented 7 months ago

To be handled by the Cat-VRS