Open spvensko opened 9 months ago
Hi @spvensko,
Glad the tool ran smoothly on your end, the chromsome coordinate is the 5' splicing site and 3' splicing site, so the junction jumps from 20658676 to 20659066, instead of taking the part in between, if that makes sense.
Please see below screenshot for the peptide generated from this junction:
Best, Frank
Thank you for the excellent explanation!
Another oddity I noticed:
AADVSGLPL,ENSG00000110427:E1.1-I1.1 ['HugoLo_IPRES_2016-Pt01-ar-279.Aligned.sortedByCoord.out.bed'] 1 KIAA1549L chr11:33542146-33542147(+) 0.03866191580891609 0.9999941348374569
In this case, the 5' and 3' splice sites are neighboring bases, correct?
I checked the SJ.out.tab
file, but wasn't able to find any evidence for this junction:
chr11 33531115 33533544 2 2 1 1 0 43
chr11 33542147 33542920 1 1 1 2 1 22
chr11 33542147 33544766 1 1 1 11 0 43
chr11 33544337 33544766 1 1 1 18 0 47
Is this a false positive or is there a different explanation? Also, can you explain the EX1.Y1-EX2.Y2 and EX1.Y1-IX2.Y2 nomenclature (e.g. E1.1-I1.1
)?
Hi @spvensko,
Thanks for bringing this up, this is an intron retention (intron 1), meaning the whole intron 1 is not properly excised but retained in the transcript, resulting in a read-through. That's why this is only one base difference.
It won't be reported in STAR SJ.out.tab
, as far as I understand, only reports junctions but not intron retention. In our Supplementary figure 1, we illustrated how we define the Exon ID and segment ID (the question you mentioned), I also pasted below hoping that can clarify some confusion:
In our Supplementary Figure 2, we showed a benchmark my lab mate conducted before for intron retention prediction against other tools using simulated data.
Let me know if I can help answering any question!
Best, Frank
Hello! I was able to get the tool to work and have analyzed two Hugo et al., 2016 patients. I am reviewing the outputs from
frequency_stage3_verbosity1_uid_gene_symbol_coord_mean_mle.txt
and have a quick question regarding what each column is describing.For reference, I am using GATK's
Homo_sapiens.assembly38.fasta
reference fasta for alignment and GENCODE's v37, v43, and v45 GTFs to determine exonic coordinates.With that in mind, here is a line from the file:
The peptide of interest is
EKDTPRYSF
which is from geneENSG00000004846
. Thechr7:20658676-20659066(+)
should be genomic coordinates containing the peptide of interest.If I pull
chr7:20658676-20659066
fromHomo_sapiens.assembly38.fa
and then translate it through ExPASy's web service, the peptide of interest,EKDTPRYSF
doesn't appear to be present:Can you please help me understand these columns so I may better understand my results?
Thanks, Steven P. Vensko II