ga4gh / vrs

Extensible specification for representing and uniquely identifying biological sequence variation
https://vrs.ga4gh.org
Apache License 2.0
80 stars 32 forks source link

Question about Adjacencies & Inversion #469

Closed greg-sharpe closed 7 months ago

greg-sharpe commented 7 months ago

I have a question about how to specify AdjoiningSequences in an Adjacency in the case of an inversion.

image

Is it possible to define AdjoiningSequence 2 the same way in each Adjacency, so that AdjoiningSequence 2 could be a single SequenceLocation object referenced by both Adjacencies?

The problem I'm seeing is that the definition of SequenceLocation requires End greater than or equal to Start, meaning that AdjoiningSequence 2 needs to have {Start: C, End: D}. Then it would be impossible to look at either Adjacency and determine that it's describing an inversion, as opposed to describing an adjacency that's present in the reference sequence

(Edited per Larry's correction that End >= Start)

larrybabb commented 7 months ago

In your example above I would refer to two examples that we have images for in the repo currently Reverse complement adjacency example and SV Haplotype example.

in the reverse complement adjacency it illustrates that fact that you only provide the start or the end when specifying either of the SequenceLocations in the adjoinedSequences (aka breakpoint). When you provide a start you are saying that that side of the sequence decreases as it moves away from the adjacency. And likewise when you provide an end you are saying that that side of the adjoined sequence increases as it moves away from the adjacency.

In the Reverse complement adjacency example the 2nd sequenceLocation specifies the end which readily identifies that the second sequence is reverse complemented.

You could apply this to the SV Haplotype example by simply changing the start and end attributes from the 2nd part of the first adjacency with the 1st part of the second adjacency.

@ahwagner please verify my response...

larrybabb commented 7 months ago

I don't think we have "yet" changed our requirement that End >= Start when fully specifying a SequenceLocation. If this is true (and I'll wait on @ahwagner 's validation). That means any inversion would require the use of adjacencies to be defined.

ahwagner commented 7 months ago

In the current model, the Adjacency only describes the sequence junction, nothing beyond. So the use of A and F are not allowed as described in the above model; instead you would have 4 Adjacency objects, ordered in a containing structure (the current proposal is that structure is a revised Haplotype, but there is an open discussion about that at #461). The information for the four adjacency objects, per the current schema and examples would look like:

Adjacency 1: [{start: A}] (if terminal) Adjacency 2: [{end: B}, {end: D}] Adjacency 3: [{start: C}, {start: E}] Adjacency 4: [{end: F}] (if terminal)

Hope this helps.

greg-sharpe commented 7 months ago

Thanks for the clarification!

Mrinal-Thomas-Epic commented 7 months ago

@ahwagner @larrybabb Is the constraint (that only one of start or stop can be specified on a SequenceLocation within an Adjacency specified explicitly anywhere in the schema)?

ahwagner commented 7 months ago

@Mrinal-Thomas-Epic no, there is no constraint specified at the schema level; just the bare-bones work at the moment. I can't get to creating this constraint in the near term, but I think it would be useful to have it. If you agree, please make a separate issue requesting this feature.