rTea
(RNA Transposable Element Analyzer)rTea
is a computational method to detect transposon-fusion RNA.
We developed rTea
to detect TE-fusion transcripts from short-read RNA-seq data. We utilized multiple features from aligned reads, such as base quality of clipped sequences, percentage of multi-mapped reads, and matching score of reads to TE sequences to filter out false positives caused by nonspecifically mapped reads.
Users can try rTea
on a demo data set and can check the output at https://gitlab.aleelab.net/junseokpark/rTea-results
rTea
runs on a Linux-based operating system with certain prerequisite software. Here is a list of the software you should install before you start using rTea
.
System software for Ubuntu 18.04 LTS
apt-get update && apt-get install -y \
cmake \
libxml2-dev \
libcurl4-openssl-dev \
libboost-dev \
gawk \
libssl-dev \
pigz \
htop \
iputils-ping
Before installing rTea
, you'll also need to set up the prerequisite software and environment variables (ENV).
R (==3.6.2) and the necessary R software should be installed.
R -e "install.packages('XML', repos = 'http://www.omegahat.net/R')"
R -e "install.packages(c( \
'magrittr', \
'data.table', \
'stringr', \
'optparse', \
'Rcpp', \
'BiocManager' \
))"
R -e "BiocManager::install(c( \ 'GenomicAlignments', \ 'BSgenome.Hsapiens.UCSC.hg19', \ 'BSgenome.Hsapiens.UCSC.hg38', \ 'EnsDb.Hsapiens.v75', \ 'EnsDb.Hsapiens.v86' \ ))"
* Download GRCh38 [genome_snp_tran](https://genome-idx.s3.amazonaws.com/hisat/grch38_snptran.tar.gz)
## Use Docker for Installation
Build a Docker file and run ``rTea`` in the Docker container.
```bash
DOCKER_BUILDKIT=1 docker build -t rtea .
After creating a Docker image for rTea
, convert it to Singularity.
docker save -o rTea.tar rtea:latest
singularity build rTea.simg docker-archive://rTea.tar
rTea
If you are using Docker as your runtime environment, run the Docker image to execute rTea
.
docker exec -it -v ${GENOME_SNP_TRAN_DIR}:/app/rTea/hg38/genome_snp_tran rtea bash
If the runtime environment is Singularity, execute the Singularity image to run rTea
.
singularity shell -B ${GENOME_SNP_TRAN_DIR}:/app/rTea/hg38/genome_snp_tran \
rTea.simg
rTea
supports paired-end FASTQ files and a BAM file as input.
For FASTQ file input, use the following command:
rTea.sh \
${R1.fq}.gz \
${R2.fq}.gz \
$SAMPLE_NAME \
$GENOME_SNP_TRAN_DIR \
$NUMBER_OF_CORES \
$OUT_DIR \
hg38 \
resume
For BAM file input, please use the following command:
rnatea_pipeline_from_bam \
${BAM} + \
$SAMPLE_NAME \
$GENOME_SNP_TRAN_DIR \
$NUMBER_OF_CORES \
$OUT_DIR \
hg38
After running rTea , the user can find a |
Column | Description |
---|---|---|
chr | Chromosome name | |
pos | Fusion breakpoint position on the chromosome | |
ori | Fusion direction on the chromosome (f, TE|gene; r, gene|TE) | |
class | TE class | |
seq | Proximal portion of fusion sequence | |
isPolyA | Whether it is a fusion with polyA sequence | |
posRepFamily | Repeat masked repeat family on the breakpoint position | |
posRep | Repeat masked repeat element on the breakpoint position | |
TEfamily | TE family with highest alignment score when fusion sequence is aligned with consensus TE sequence | |
TEscore | Alignment score of fusion sequence with the consensus TE sequence | |
TEside | Fusion direction on the consensus TE sequence (5, TE|gene; 3, gene|TE) | |
TEbreak | Fusion breakpoint position on the consensus TE sequence | |
depth | Number of RNA-seq reads on the breakpoint position | |
matchCnt | Number of fusion-supporting RNA-seq reads | |
polyAcnt | Number of polyA reads | |
baseQual | Median base quality of supporting reads | |
lowMapQual | Number of supporting reads that have low mapping quality | |
mateDist | Minimum distance of mate reads | |
overhang | Distance of breakpoint from splice site | |
gap | Length of nearby intron | |
secondary | Proportion of supporting reads that are from secondary alignment | |
nonspecificTE | Mean alignment score of supporting reads to consensus TE sequence | |
r1pstrand | Proportion of supporting reads that are from positive strand of chromosome | |
fusion_tx_id | Transcript ID of the fusion transcript | |
tx_support_exon | Number of read fragments spanning exonic region of the fusion transcript ID | |
tx_support_intron | Number of read gaps matching the fusion transcript ID | |
strand | Strand of fusion transcript | |
pos_type | Genomic region of breakpoint | |
polyTE | Known non-reference TE on the breakpoint position | |
hardstart | Start position of nearby reference genome where fusion sequence came from | |
hardend | End position of nearby reference genome where fusion sequence came from | |
hardTE | Repeat masked TE subfamily of nearby reference genome where fusion sequence came from | |
hardDist | Distance from fusion breakpoint to nearby reference genome where fusion sequence came from | |
fusion_type | Type of TE fusion | |
fusion_tx_biotype | Biotype of fusion transcript | |
fusion_gene_id | Gene ID of fusion transcript | |
fusion_gene_name | Gene symbol of fusion transcript | |
Filter | Filter reason of low confidence fusion |