Open korikuzma opened 10 months ago
Apparently it’s hard to follow because the previous model was not the best. There is now a better way to represent using VRS 2.0. @ahwagner your time to shine ✨ I'd also like to throw out that example input/output would be nice 😉
I think the alignment part makes sense, but have struggled to find a good way to represent what is needed. So going to stop progress until we hear how to represent using VRS 2.0. Initial progress is in this branch.
Linking the Cool Seq Tool documentation of this method for my own reference.
The idea of this method is to help us go from the ends of fusion transcript segments in exon representation, to where those ends exist on a genomic sequence. On occasion, fusion transcripts contain sequence that is intronic (exists on a genomic sequence but not the transcript), or have a junction that omits some sequence at the end of an exon. In the fusions model, we use offsets to describe this change. This concept of offset representation is also used in HGVS coding sequence representations.
I think we should postpone any refactor of this method until the VRS 2.0 beta1 release for the Adjacency class, as there is a fundamental shift from a segment-based model to a junction (adjacency) model. Linking this thread for progress on that release. Once that is completed, we should use the beta model to revise how this is represented in FUSOR.
In slack, @ahwagner asked: I think this is okay to resume now that we have the Adjacency class, right?
@ahwagner You had said in comment about VRS 2.0 beta1, but I don't think we're at beta yet. Did you want us to proceed still?
I think we should postpone any refactor of this method until the VRS 2.0 beta1 release for the Adjacency class, as there is a fundamental shift from a segment-based model to a junction (adjacency) model.
Tagging @jsstevenson and @katiestahl so they can follow. Slack doesn't allow thread in a thread
We did away with Alpha/Beta/RC for maturity levels which may have created some confusion here. I am comfortable with the draft
model for Adjacency
from the latest VRS pre-release and we already have it in VRS-Python, so I think we can move forward on implementing it for Cool-Seq-Tool.
@ahwagner ah okay. That's my bad for not remembering the maturity model changes.
No fault here. With those changes this issue needed to be clarified. I also don't think anything has been lost by clarifying this now instead of earlier. However, with the recent https://github.com/cancervariants/fusion-curation/issues/277 issue it is a good time to revisit.
Going to add Alex's requested changes here:
genomic_to_tx_segment
and tx_segment_to_genomic
genomic_to_tx_segment
genomic_to_tx_segment
, rename start
and end
to genomic_start
and genomic_end
New structure (Aligned Segment) will look like follows:
{
"gene": "WEE1",
"alt_ac": "NC_000011.10",
"seg_start": {
"exon_ord": 1,
"offset": 0,
"genomic_location": {
"type": "SequenceLocation",
"sequenceReference": {
"type": "SequenceReference",
"refgetAccession": "SQ.2NkFm8HK88MqeNkCgj78KidCAXgnsfV1"
},
"start": 9575887
}
},
"seg_end": {
"exon_ord": 10,
"offset": 0,
"genomic_location": {
"type": "SequenceLocation",
"sequenceReference": {
"type": "SequenceReference",
"refgetAccession": "SQ.2NkFm8HK88MqeNkCgj78KidCAXgnsfV1"
},
"end": 9589767
}
},
"tx_ac": "NM_003390.3"
}
@jarbesfeld am I missing anything?
@jarbesfeld and I made a Lucid to help us with positions + offsets and @ahwagner reviewed. Tagging @jsstevenson @katiestahl so it can help them too. We may want to consider cleaning this up and adding it to the Cool-Seq-Tool documentation.
Fixing some labels
I think all that's left to do in this epic is DRY + smaller methods
A lot of this was written years ago. It's hard to follow what's happening. We should refactor this class