GenomicMedLab / cool-seq-tool

https://coolseqtool.readthedocs.io
MIT License
4 stars 0 forks source link

"Start chr does not match end chr" when calling `genomic_to_exon_coords` with one position in intronic space #329

Closed jsstevenson closed 2 weeks ago

jsstevenson commented 1 month ago

Describe the bug

When calling genomic_to_exon_coords with a start position on an exon and an ending position in intronic space (or vice versa), an older accession, and nearest_transcript_junction=True, different procedures (liftover etc) are performed on the two positions. This results in an error about the chromosome accessions of the start and end coordinates being out of joint.

Steps to reproduce

from cool_seq_tool.app import CoolSeqTool
import asyncio

from cool_seq_tool.schemas import Strand

async def do_thing():
    cst = CoolSeqTool()
    result = await cst.ex_g_coords_mapper.genomic_to_transcript_exon_coordinates(
        alt_ac="NC_000007.13",  # not latest AC for chr 7
        start=73442503,
        end=73457929,  # not on an exon
        strand=Strand.POSITIVE,
        gene="ELN",
        get_nearest_transcript_junction=True
    )
    print(result)

asyncio.run(do_thing())

Expected behavior

n/a

Current behavior

Gives a warning (really an error):

"Start `chr`, NC_000007.14, does not match End `chr`, NC_000007.13"

Possible reason(s)

In the private method _genomic_to_transcript_exon_coordinate, there's a logic branch for get_nearest_transcript_junction, that ends with an if check for whether the position is on an exon. If it's not, then it's handled there and returned, but if it is, it leaves this section of the code entirely and has additional processing performed in the rest of the method. This appears to result in one accession being swapped in for a newer one. Not sure exactly where the bug is or what the correct response is yet.

if not self._is_exonic_breakpoint(pos, tx_genomic_coords):
    exon = self._get_adjacent_exon(
        tx_exons_genomic_coords=tx_genomic_coords,
        strand=strand,
        start=pos if is_start else None,
        end=pos if not is_start else None,
    )

    params["exon"] = exon
    params["transcript"] = transcript
    params["gene"] = gene
    params["pos"] = pos
    params["chr"] = alt_ac

    self._set_exon_offset(
        params=params,
        start=tx_genomic_coords[exon - 1][3],  # Start exon coordinate
        end=tx_genomic_coords[exon - 1][4],  # End exon coordinate
        pos=pos,
        is_start=is_start,
        strand=strand,
    )
    params["strand"] = strand.value
    resp.transcript_exon_data = TranscriptExonData(**params)
    return resp
# if position is on an exon, doesn't return here, but continues on for the rest of the method

Suggested fix

No response

Branch, commit, and/or version

main

Screenshots

No response

Environment details

n/a

Additional details

No response

Contribution

Yes, I can create a PR for this fix.

korikuzma commented 1 month ago

Haven't tested, but adding alt_ac = mane_transcripts[0]["GRCh38_chr"] here returns a response.

korikuzma commented 2 weeks ago

We need to update the genomic_pos to GRCh38

korikuzma commented 2 weeks ago

@jarbesfeld if GRCh38 assembly fails, do we need to revert to GRCh37? I realized that I've been having the mapper fail if we're unable to liftover to GRCh38.

jarbesfeld commented 2 weeks ago

@korikuzma I think it makes sense to leave as is and have the mapper fail if liftover to GRCh38 is unsuccessful.

korikuzma commented 2 weeks ago

@jarbesfeld okay, we can always add a new issue if it’s needed in the future. I’ll update the docs later

github-actions[bot] commented 2 weeks ago

Closed by #356.