Clean up ExonGenomicCoordsMapper

korikuzma commented 10 months ago

A lot of this was written years ago. It's hard to follow what's happening. We should refactor this class

korikuzma commented 9 months ago

Apparently it’s hard to follow because the previous model was not the best. There is now a better way to represent using VRS 2.0. @ahwagner your time to shine ✨ I'd also like to throw out that example input/output would be nice 😉

korikuzma commented 9 months ago

I think the alignment part makes sense, but have struggled to find a good way to represent what is needed. So going to stop progress until we hear how to represent using VRS 2.0. Initial progress is in this branch.

ahwagner commented 9 months ago

Linking the Cool Seq Tool documentation of this method for my own reference.

The idea of this method is to help us go from the ends of fusion transcript segments in exon representation, to where those ends exist on a genomic sequence. On occasion, fusion transcripts contain sequence that is intronic (exists on a genomic sequence but not the transcript), or have a junction that omits some sequence at the end of an exon. In the fusions model, we use offsets to describe this change. This concept of offset representation is also used in HGVS coding sequence representations.

I think we should postpone any refactor of this method until the VRS 2.0 beta1 release for the Adjacency class, as there is a fundamental shift from a segment-based model to a junction (adjacency) model. Linking this thread for progress on that release. Once that is completed, we should use the beta model to revise how this is represented in FUSOR.

korikuzma commented 3 months ago

In slack, @ahwagner asked: I think this is okay to resume now that we have the Adjacency class, right?

@ahwagner You had said in comment about VRS 2.0 beta1, but I don't think we're at beta yet. Did you want us to proceed still?

I think we should postpone any refactor of this method until the VRS 2.0 beta1 release for the Adjacency class, as there is a fundamental shift from a segment-based model to a junction (adjacency) model.

korikuzma commented 3 months ago

Tagging @jsstevenson and @katiestahl so they can follow. Slack doesn't allow thread in a thread

ahwagner commented 3 months ago

We did away with Alpha/Beta/RC for maturity levels which may have created some confusion here. I am comfortable with the draft model for Adjacency from the latest VRS pre-release and we already have it in VRS-Python, so I think we can move forward on implementing it for Cool-Seq-Tool.

korikuzma commented 3 months ago

@ahwagner ah okay. That's my bad for not remembering the maturity model changes.

ahwagner commented 3 months ago

No fault here. With those changes this issue needed to be clarified. I also don't think anything has been lost by clarifying this now instead of earlier. However, with the recent https://github.com/cancervariants/fusion-curation/issues/277 issue it is a good time to revisit.

korikuzma commented 2 months ago

Going to add Alex's requested changes here:

[x] Rename methods to genomic_to_tx_segment and tx_segment_to_genomic
[x] Strand should not be provided for genomic_to_tx_segment
[x] In genomic_to_tx_segment, rename start and end to genomic_start and genomic_end
[x] Assume that coordinates are inter-residue

New structure (Aligned Segment) will look like follows:

{
      "gene": "WEE1",
      "alt_ac": "NC_000011.10",
      "seg_start": {
            "exon_ord": 1,
            "offset": 0,
            "genomic_location": {
                  "type": "SequenceLocation",
                  "sequenceReference": {
                        "type": "SequenceReference",
                        "refgetAccession": "SQ.2NkFm8HK88MqeNkCgj78KidCAXgnsfV1"
                  },
                  "start": 9575887
            }
      },
      "seg_end": {
            "exon_ord": 10,
            "offset": 0,
            "genomic_location": {
                  "type": "SequenceLocation",
                  "sequenceReference": {
                        "type": "SequenceReference",
                        "refgetAccession": "SQ.2NkFm8HK88MqeNkCgj78KidCAXgnsfV1"
                  },
                  "end": 9589767
            }
      },
      "tx_ac": "NM_003390.3"
}

@jarbesfeld am I missing anything?

korikuzma commented 2 months ago

@jarbesfeld and I made a Lucid to help us with positions + offsets and @ahwagner reviewed. Tagging @jsstevenson @katiestahl so it can help them too. We may want to consider cleaning this up and adding it to the Cool-Seq-Tool documentation.

Cool-Seq-Tool Mapper

korikuzma commented 2 months ago

Fixing some labels Cool-Seq-Tool Mapper(1)

korikuzma commented 1 month ago

I think all that's left to do in this epic is DRY + smaller methods

GenomicMedLab / cool-seq-tool

Clean up ExonGenomicCoordsMapper #224