ga4gh / vrs

Extensible specification for representing and uniquely identifying biological sequence variation
https://vrs.ga4gh.org
Apache License 2.0
80 stars 32 forks source link

Policy on using Adjacencies or Allele representations #453

Closed ahwagner closed 7 months ago

ahwagner commented 10 months ago

In VRS, we use Allele objects for small variant representations, and Adjacencies for "structural" variation. But the distinction between the two has always been arbitrary. We should have a policy–or at least guidance–on when to use one structure or the other.

For assayed variants, it could be as simple as a short statement, e.g.: if the variant starts and ends on the same reference sequence and is fully spanned by the assay technology, use Alleles. If not, use Adjacencies.

We might also consider additional exposition to cover other cases.

rrfreimuth commented 10 months ago

If a choice is arbitrary, then policy and/or guidance would certainly help.

Ideally, the representations should be bidirectionally transformable. If only unidirectional transformation is possible, then the common representation could serve as the canonical, normalized form and the other could be a convenience.

ahwagner commented 10 months ago

Transformations are bidirectional for this. It is a longstanding issue in the field, addressing the question of "when does a variant become an SV?"

In most contexts, it is just an arbitrary length cutoff; <n is a small variant (Allele), >=n is a structural variant (Adjacencies). 50 is a typical value for n.

You can imagine that a SNV in VRS can be represented as an Allele located at start: x, end: y with state s. But it could just as easily be an Adjacency ending at x, beginning at y with a linker sequence of s. The first form is much more compact, but either is valid. So we need guidance on when SHOULD use one form or the other.

larrybabb commented 8 months ago

While I think we can and should define the issue clearly and state what is commonly done, I don't think it is our job to define a standard policy. We can define the rule we use in our implementations like vrs-Python APIs. And we can possibly go as far as recommending how it should be done but groups like ACMG or other groups can work to define a credible policy that the global community might adopt. I don't want us to spend too much time trying to define a policy that is really a recommendation until a credited agency can weigh in with more formal guidance.