Retreiving nucleotide coding sequence and genomic origin for peptides

frankligy / SNAF

Splicing Neo Antigen Finder (SNAF) is an easy-to-use Python package to identify splicing-derived tumor neoantigens from RNA sequencing data, it further leverages both deep learning and hierarchical Bayesian models to prioritize certain candidates for experimental validation

MIT License

44 stars 9 forks source link

Hi @spvensko,

I believe I have the solution for this. See below examples:

jcmq = snaf.JunctionCountMatrixQuery.deserialize('result/after_prediction.p')
uid = 'ENSG00000065609:E45.1-E47.1'
nj_list = jcmq.results[0]
for nj in nj_list:
    if nj is not None and nj.uid == uid:
        print(nj.junction)

You will get below junction sequence with , to delimit the two ends of splicing junction:

CCTCCTGCTGGGACAGGCATGCCCATGATGCCTCAGCAGCCGGTCATGTTTGCACAGCCCATGATGAGGCCCCCCTTTGGAGCTGCCGCTGTACCTGGCACGCAG,CTGCAATATTTGTGACTGAATAGGAAAATAAATGAGTTTGGAGACTTCAAATAAGATTGATGCTGAGTTTC

Let's BLAT the first and second half in UCSC genome browser:

First Screenshot 2024-06-14 at 12 50 23 PM

Second Screenshot 2024-06-14 at 12 50 40 PM

But you can also derive that using other codes, I shared one solution in this issue (https://github.com/frankligy/SNAF/issues/31.)

Hopefully this helps a bit, Frank

frankligy / SNAF

Retreiving nucleotide coding sequence and genomic origin for peptides #45