`rTea` (RNA Transposable Element Analyzer)

rTea is a computational method to detect transposon-fusion RNA. rTea

Citation: Pan-cancer analysis reveals multifaceted roles of retrotransposon-fusion RNAs

Overview

We developed rTea to detect TE-fusion transcripts from short-read RNA-seq data. We utilized multiple features from aligned reads, such as base quality of clipped sequences, percentage of multi-mapped reads, and matching score of reads to TE sequences to filter out false positives caused by nonspecifically mapped reads.

Demo and result files

Users can try rTea on a demo data set and can check the output at https://gitlab.aleelab.net/junseokpark/rTea-results

Installation

rTea runs on a Linux-based operating system with certain prerequisite software. Here is a list of the software you should install before you start using rTea.

System software for Ubuntu 18.04 LTS

apt-get update && apt-get install -y \
cmake \
libxml2-dev \
libcurl4-openssl-dev \
libboost-dev \
gawk \
libssl-dev \
pigz \
htop \
iputils-ping

Before installing rTea, you'll also need to set up the prerequisite software and environment variables (ENV).
- fastp
- HISAT2 (>= v2.1.0)
- samtools (>= v1.9)
- HTSlib (>= v1.9)
- Scallop (>= v0.10.4)
- bamtools (>= v2.5.1)
```
# Bamtools environment
# BAMTOOL_HOME is installed directory
PKG_CXXFLAGS="-I$BAMTOOL_HOME/include/bamtools"
PKG_LIBS="-L$BAMTOOL_HOME/lib -lbamtools"
```
- bwa (>=0.7.17)

R (==3.6.2) and the necessary R software should be installed.


R -e "install.packages('XML', repos = 'http://www.omegahat.net/R')"
R -e "install.packages(c( \
   'magrittr', \
   'data.table', \
   'stringr', \
   'optparse', \
   'Rcpp', \
   'BiocManager' \
 ))"

R -e "BiocManager::install(c( \ 'GenomicAlignments', \ 'BSgenome.Hsapiens.UCSC.hg19', \ 'BSgenome.Hsapiens.UCSC.hg38', \ 'EnsDb.Hsapiens.v75', \ 'EnsDb.Hsapiens.v86' \ ))"

* Download GRCh38 [genome_snp_tran](https://genome-idx.s3.amazonaws.com/hisat/grch38_snptran.tar.gz)

## Use Docker for Installation
Build a Docker file and run ``rTea`` in the Docker container.
```bash
DOCKER_BUILDKIT=1 docker build -t rtea .

Use Singularity for Installation

After creating a Docker image for rTea, convert it to Singularity.

docker save -o rTea.tar rtea:latest
singularity build rTea.simg docker-archive://rTea.tar

Running `rTea`

If you are using Docker as your runtime environment, run the Docker image to execute rTea.

docker exec -it -v ${GENOME_SNP_TRAN_DIR}:/app/rTea/hg38/genome_snp_tran rtea bash

If the runtime environment is Singularity, execute the Singularity image to run rTea.

singularity shell -B ${GENOME_SNP_TRAN_DIR}:/app/rTea/hg38/genome_snp_tran \
    rTea.simg

rTea supports paired-end FASTQ files and a BAM file as input. For FASTQ file input, use the following command:

rTea.sh \
        ${R1.fq}.gz \
        ${R2.fq}.gz \
        $SAMPLE_NAME \
        $GENOME_SNP_TRAN_DIR \
        $NUMBER_OF_CORES \
        $OUT_DIR \
        hg38 \
        resume

For BAM file input, please use the following command:

rnatea_pipeline_from_bam \
        ${BAM} + \
        $SAMPLE_NAME \
        $GENOME_SNP_TRAN_DIR \
        $NUMBER_OF_CORES \
        $OUT_DIR \
        hg38

Output file

After running `rTea`, the user can find a .rTea.txt file in the rTea directory, which contains information about TEs and other supporting data.	Column	Description
chr	Chromosome name
pos	Fusion breakpoint position on the chromosome
ori	Fusion direction on the chromosome (f, TE\|gene; r, gene\|TE)
class	TE class
seq	Proximal portion of fusion sequence
isPolyA	Whether it is a fusion with polyA sequence
posRepFamily	Repeat masked repeat family on the breakpoint position
posRep	Repeat masked repeat element on the breakpoint position
TEfamily	TE family with highest alignment score when fusion sequence is aligned with consensus TE sequence
TEscore	Alignment score of fusion sequence with the consensus TE sequence
TEside	Fusion direction on the consensus TE sequence (5, TE\|gene; 3, gene\|TE)
TEbreak	Fusion breakpoint position on the consensus TE sequence
depth	Number of RNA-seq reads on the breakpoint position
matchCnt	Number of fusion-supporting RNA-seq reads
polyAcnt	Number of polyA reads
baseQual	Median base quality of supporting reads
lowMapQual	Number of supporting reads that have low mapping quality
mateDist	Minimum distance of mate reads
overhang	Distance of breakpoint from splice site
gap	Length of nearby intron
secondary	Proportion of supporting reads that are from secondary alignment
nonspecificTE	Mean alignment score of supporting reads to consensus TE sequence
r1pstrand	Proportion of supporting reads that are from positive strand of chromosome
fusion_tx_id	Transcript ID of the fusion transcript
tx_support_exon	Number of read fragments spanning exonic region of the fusion transcript ID
tx_support_intron	Number of read gaps matching the fusion transcript ID
strand	Strand of fusion transcript
pos_type	Genomic region of breakpoint
polyTE	Known non-reference TE on the breakpoint position
hardstart	Start position of nearby reference genome where fusion sequence came from
hardend	End position of nearby reference genome where fusion sequence came from
hardTE	Repeat masked TE subfamily of nearby reference genome where fusion sequence came from
hardDist	Distance from fusion breakpoint to nearby reference genome where fusion sequence came from
fusion_type	Type of TE fusion
fusion_tx_biotype	Biotype of fusion transcript
fusion_gene_id	Gene ID of fusion transcript
fusion_gene_name	Gene symbol of fusion transcript
Filter	Filter reason of low confidence fusion

Licenses

Contacts

Junseok Park Boram Lee

ealeelab / rtea

readme

`rTea` (RNA Transposable Element Analyzer)

Overview

Demo and result files

Installation

Use Singularity for Installation

Running `rTea`

Output file

Licenses

Contacts

ealeelab / rtea

readme

rTea (RNA Transposable Element Analyzer)

Overview

Demo and result files

Installation

Use Singularity for Installation

Running rTea

Output file

Licenses

Contacts

`rTea` (RNA Transposable Element Analyzer)

Running `rTea`