Alternative splicing generates thousands of isoforms to multiply the diversity of the proteome and non-coding RNA transcripts. Current high-throughput sequencing technologies are capable of capturing this post-transcriptional processes with great sensitvity but only yield rather short read lengths. It thus becomes very challenging to predict the entire exon composition of RNA molecules. To qualitatively and quantitatively assess alternative splicing on a genome-wide level without prior knowledge, we developed this pipeline that quantifies the the usage of unknown or known splice sites as well as the expression level of exons to calculate a splicing index called Percent Spliced In (PSI) from RNA-seq and Ribo-seq data. Ranging from 0% for non-included exons to 100% for constitutive exons, it indicates how efficiently an exon sequence is spliced into the pool of transcripts transcribed from a genetic locus. This technology is capable of identifying and quantifying novel and complex exon skipping, alternative 5 and 3 splice sites and mutually exclusive splicing events.
For detail information please refer to our publication: Please cite our paper, if you work with our script.
Runing environment
For python psi script
For shell psi script
For Visulization
About the input file.
For Shell PSI script:
./psi_shell/PSI.sh StartPSIStrictFilter flattened_gff_file.gtf readlen alignment_file.bam junctions.bed baseName
Output
ENSRNOG00000033734:001 38 46548 0 1
ENSRNOG00000033734:002 11 47993 0 1
ENSRNOG00000033734:003 12 7772 32484 0.175863
ENSRNOG00000033734:004 30 54112 2843 0.935926
ENSRNOG00000033734:005 87 113977 0 1
Field description
For python PSI script:
Counting inclusion count
./psi_python/dexseq_count.py -p no -s no -r pos -f bam flattened_gff_file.gtf alignment_file.bam basename.inclusion
Counting exclusion count
./psi_python/exclusion_count.py flattened_gff_file.gtf junctions.bed basename.exclusion
Calculate the psi
./psi_python/psi_calculation.py -l 100 flattened_gff_file.gtf basename.inclusion basename.exclusion basename
NOTICE: step 1 and step 2 can be run parallel.
Output
ENSRNOG00000033734:001 1.0 1 44487 0 13 48872971 48873009 + 38
ENSRNOG00000033734:002 1.0 1 45933 0 13 48873964 48873975 + 11
ENSRNOG00000033734:003 0.170570528295 1 7490 32484 13 48874085 48874097 + 12
ENSRNOG00000033734:004 0.933449251441 1 51960 2843 13 48876463 48876493 + 30
Field description
If the set --detail yes, then only showing first three fields like following.
ENSRNOG00000033734:001 1.0 1
ENSRNOG00000033734:002 1.0 1
ENSRNOG00000033734:003 0.170570528295 1
ENSRNOG00000033734:004 0.933449251441 1
We also provide R and ggplot2 based visulization script for psi result visulization, located on psi_visulization directory.
Configuration and running.
For simple visulization, user only need to run psi.plot.r. Before running, some options need to be changed to fit local environment.
Source psi_plot_vis.r script
source('./psi_plot_vis.r') ## Here should be the correct path to the psi_plot_vis.r script
Provide sample experiment table like DESeq style
sampleTablePath = './sampleTable.csv' ## Here should be the path to sampleTable, user can refer to the testing data to create own one.
Provide psi result folder
psiFolder = './psi_files/'
Provide the path to exonic part matrix which can be created by dexseq_prepare_annotation.py script from DEXSeq packages.
exonicPartMatrixPath ="./homo_sapiens.GRCh38.78_exonic.gtf"
gene id
gene_id = "ENSG00000099810+ENSG00000264545+ENSG00000274055+ENSG00000264801+ENSG00000240498"
Optional if aggregate the PSI by experiment group.
as.group = TRUE
For ploting a single genes, refering following code.
psiTable <- importPSI(sampleTablePath, psiFolder)
exonicPartMatrix <- import(exonicPartMatrixPath)
geneAnnot <- getGenesAnot(gene_id, exonicPartMatrix)
genePsi <- getGenesPsi(gene_id = gene_id,
psiTable = psiTable,
sampleTablePath = sampleTablePath,
as.group = TRUE)
## ploting the psi
plot(psiPlot(geneAnnot, gene_id, genePsi))
For batch ploting a gene list, refering following code.
## Gene list
gene_list =readLines("./gene_list.txt")
psiTable <- importPSI(sampleTablePath, psiFolder)
for(sample in gene_list ){
geneAnnot <- getGenesAnot(sample, exonicPartMatrix)
genePsi <- getGenesPsi(gene_id = sample,
psiTable = psiTable,
sampleTablePath = sampleTablePath,
as.group = TRUE)
path.out <- file.path("./plots/")
if(!file.exists(path.out)){dir.create(path.out)}
pdf(file=paste("./plots/",sample,'.pdf',sep = ''),width=40,height=30 )
plot(psiPlot(geneAnnot, sample, genePsi))
dev.off()