Open spvensko opened 5 months ago
Hi @spvensko,
I believe I have the solution for this. See below examples:
jcmq = snaf.JunctionCountMatrixQuery.deserialize('result/after_prediction.p')
uid = 'ENSG00000065609:E45.1-E47.1'
nj_list = jcmq.results[0]
for nj in nj_list:
if nj is not None and nj.uid == uid:
print(nj.junction)
You will get below junction sequence with ,
to delimit the two ends of splicing junction:
CCTCCTGCTGGGACAGGCATGCCCATGATGCCTCAGCAGCCGGTCATGTTTGCACAGCCCATGATGAGGCCCCCCTTTGGAGCTGCCGCTGTACCTGGCACGCAG,CTGCAATATTTGTGACTGAATAGGAAAATAAATGAGTTTGGAGACTTCAAATAAGATTGATGCTGAGTTTC
Let's BLAT the first and second half in UCSC genome browser:
First
Second
But you can also derive that using other codes, I shared one solution in this issue (https://github.com/frankligy/SNAF/issues/31.)
Hopefully this helps a bit, Frank
Hello,
My understanding is that the
coord
column contains the coordinates of the splicing event (e.g. the coordinates may be that of a skipped exon) and not the genomic coordinates of the sequence encoding the peptide. With that in mind, is it possible to retrieve the coding sequence and genomic origin of each peptide (as in, the actual coordinates where that peptide is encoded) out of the current outputs? It appears the coding sequence is utilized at https://github.com/spvensko/SNAF/blob/v0.7.0/snaf/snaf.py#L1190, but I wanted to check with you before I try to develop my own solution.Thanks, Steven V.