UofLBioinformatics / seekCRIT

seek for circular RNA in transcriptome (tool to identify deferentially expressed circRNAs between two samples)
MIT License
7 stars 3 forks source link

seekCRIT

seek for circular RNA in transcriptome (identifies deferentially expressed circRNAs between two samples)

Version: v1.0.0.b

Last Modified: 2019-04-25

Authors: Bioinformatics Lab, University of Louisville, Kentucky Biomedical Research Infrastructure Network (KBRIN)

A schematic flow shows the pipeline

pipeline

Different Junction Counts (linear junction counts and circular junction counts) used in the circular RNA expression level estimation.

DifferentJunctionCounts

Percent Back-spliced In (PBI) calculation (circular junction counts with respect to all junction counts)

DifferentJunctionCounts

Example PBI calculation for a differentially expressed circular RNA.

circular RNA derived from exon 2 of Cdp gene, Rat, Hippocampus, Somata and Neuropil. This circular RNA is highly expressed in Neuropil (31.9%) than Somata (11.9%). DifferentJunctionCounts

Prerequisites

Software / Package

Others

Installation

1 Download seekCRIT

git clone https://github.com/UofLBioinformatics/seekCRIT.git

cd seekCRIT

2 Install required packages (Some prerequisite packages might require admin access rights. Please contact your system admin to install such packages.)

pip3 install -r Prerequisites.txt

3 Install seekCRIT

python setup.py install

4 testing seekCRIT with testrun.sh

In order to run testrun.sh:

Usage

usage: seekCRIT.py [-h] -s1 S1 -s2 S2 -gtf GTF -o OUTDIR -t {SE,PE} 
               --genomeIndex GENOMEINDEX -fa FASTA -ref REFSEQ
               [--threadNumber numThreads]
               [--aligner aligner]
               [--deltaPSI DELTAPSI] [--highConfidence HIGHCONFIDENCE]
               [--libType {fr-unstranded,fr-firststrand,fr-secondstrand}]
               [--keepTemp {Y,N}]

Identifying and Characterizing Differentially Spliced circular RNAs between
two samples

Required arguments:
=================== 
  -s1 S1, --sample1 S1  fastq files for sample_1. Replicates are separated by
                        comma. Paired-end reads are separated by colon.
                        e.g.,s1-1.fastq,s1-2.fastq for single-end read. s1-1.R
                        1.fastq:s1-1.R2.fastq,s1-2.R1.fastq:s1-2.R2.fastq for
                        single-end read

  -s2 S2, --sample2 S2  fastq files for sample_2. Replicates are separated by
                        comma. Paired-end reads are separated by colon.
                        e.g.,s2-1.fastq,s2-2.fastq for single-end read. s2-1.R
                        1.fastq:s2-1.R2.fastq,s2-2.R1.fastq:s2-2.R2.fastq for
                        single-end read

  -gtf GTF, --gtf GTF   The gtf annotation file. e.g., hg38.gtf

  -o OUTDIR, --output OUTDIR
                        Output directory

  -t {SE,PE}, --readType {SE,PE}
                        Read type. SE for Single-end read, PE for Paired-end read

  --genomeIndex GENOMEINDEX
                        Genome indexes for the aligner

  -fa FASTA, --fasta FASTA
                        Genome sequence. e.g., hg38.fa

  -ref REFSEQ, --refseq REFSEQ
                        Transcriptome in refseq format. e.g., hg38.ref.txt

 optional arguments:
====================

   -h, --help            show this help message and exit
   --threadNumber numberOfThreadsk
                        Number of threads for multi-threading feature [default = 4]
  --aligner aligner     aligner to use(for now it supports only STAR but we are working on it to support more aligners)

  --deltaPSI DELTAPSI   Delta PSI cutoff. i.e., significant event must show
                        bigger deltaPSI than this cutoff [default = 0.05]

  --highConfidence HIGHCONFIDENCE
                        Minimum number of circular junction counts required [default = 1]

  --libType  {fr-unstranded,fr-firststrand,fr-secondstrand}
                        library type used by Tophat aligner [default ='fr-unstranded']

  --keepTemp {Y,N}      Keep temp files or not  [default='Y']

Example

Paired-end reads

python3 seekCRIT.py -o PEtest -t PE -fa fa/hg19.fa -ref ref/hg19.ref.txt --genomeIndex /media/bio/data/STARIndex/hg19 -s1 testData/231ESRP.25K.rep-1.R1.fastq:testData/231ESRP.25K.rep-1.R2.fastq,testData/231ESRP.25K.rep-2.R1.fastq:testData/231ESRP.25K.rep-2.R2.fastq -s2 testData/231EV.25K.rep-1.R1.fastq:testData/231EV.25K.rep-1.R2.fastq,testData/231EV.25K.rep-2.R1.fastq:testData/231EV.25K.rep-2.R2.fastq -gtf testData/test.gtf --threadNumber 12 

Single-end reads

python3 seekCRIT.py -o SEtest -t SE -fa fa/hg19.fa -ref ref/hg19.ref.txt --genomeIndex /media/bio/data/STARIndex/hg19 -s1 testData/231ESRP.25K.rep-1.R1.fastq,testData/231ESRP.25K.rep-1.R2.fastq,testData/231ESRP.25K.rep-2.R1.fastq,testData/231ESRP.25K.rep-2.R2.fastq -s2 testData/231EV.25K.rep-1.R1.fastq,testData/231EV.25K.rep-1.R2.fastq,testData/231EV.25K.rep-2.R1.fastq,testData/231EV.25K.rep-2.R2.fastq -gtf testData/test.gtf --threadNumber 12 

Note

Field Description
geneName Name of gene
isoform_name name of isoform
chrom chromosme
strand strand (+/-)
txStart Transcription start position
txEnd Transcription end position
cdsStart Coding region end
exonCount Number of exons
exonStarts Exon start positions
exonEnds Exon end positions

Output

See details in the example file

Field Description
chrom chromosome
circRNA_start circular RNA 5' end position
circRNA_end circular RNA 3' end position
strand DNA strand (+/-)
exonCount number of exons included in the circular RNA transcript
exonSizes size of exons included in the circular RNA transcript
exonOffsets offsets of exons included in the circular RNA transcript
circType circRNA, ciRNA, ccRNA
geneName name of gene
isoformName name of isoform
exonIndexOrIntronIndex Index (start from 1) of exon (for circRNA) or intron (for ciRNA) in given isoform
FlankingIntrons Left intron/Right intron
CircularJunctionCount_Sample_1 read count of the circular junction in sample # 1
LinearJunctionCount_Sample_1 read count of the linear junction in sample # 1
CircularJunctionCount_Sample_2 read count of the circular junction in sample # 2
LinearJunctionCount_Sample_2 read count of the linear junction in sample # 2
PBI_Sample_1 Percent Backsplicing Index for sample # 1
PBI_Sample_2 Percent Backsplicing Index for sample # 2
deltaPBI(PBI_1-PBI_2) difference between PBI values of two samples
pValue pValue
FDR FDR

To calculate the significane of differentially expressed circular RNAs, use the criteria:

License

Copyright (C) 2017 . See the LICENSE file for license rights and limitations (MIT).