TeXP is a pipeline to evaluate the transcription level of transposable elements in short read RNA-seq data
TeXP is a pipeline for quantifying abundances of Transposable Elements transcripts from RNA-Seq data. TeXP is based on the assumption that RNA-seq reads overlapping Transposable Elements is a composition of pervasive transcription signal and autonomous transcription of Transposable Elements.
https://www.biorxiv.org/content/10.1101/648667v1.full
docker run -it fnavarro/texp:latest /bin/bash
Download a fastq file from a RNA-seq experiment, for example, MCF-7 from the ENCODE project
wget -c -t0 "https://www.encodeproject.org/files/ENCFF000HFF/@@download/ENCFF000HFF.fastq.gz" -O file.fastq.gz
Run TeXP
./TeXP.sh -f file.fastq.gz -t 1 -o process/example/ -n quick_texp_run
The output files will be generated at:
ls process/example/quick_texp_run
*.L1HS_hg38.count (Naive counts)
*.L1HS_hg38.count.corrected (Corrected counts)
*.L1HS_hg38.count.rpkm (Naive RPKM)
*.L1HS_hg38.count.rpkm.corrected (Corrected RPKM)
*.L1HS_hg38.count.signal_proportions
TIPS: If fastq files are stored locally you can use
docker run -it -v ~/Desktop/:/texp fnavarro/texp:latest /bin/bash
To mount "~/Desktop" at your docker container
$> git clone https://github.com/gersteinlab/texp.git
Edit TeXP.sh and Update INSTALL_DIR variable to the path where TeXP was cloned
apt-get update
apt-get install -y \ bedtools=2.26.0+dfsg-3 \ bowtie2=2.3.0-2 \ fastx-toolkit=0.0.14-3 \ gawk=1:4.1.4+dfsg-1 \ git \ perl=5.24.1-3+deb9u1 \ python=2.7.13-2 \ r-base=3.3.3-1 \ r-base-dev=3.3.3-1 \ samtools=1.3.1-3 \ wget
mkdir -p /src; \ cd /src ; \ git clone https://github.com/lh3/wgsim.git; \ cd wgsim; \ gcc -g -O2 -Wall -o wgsim wgsim.c -lz -lm; \ mv wgsim /usr/bin/; \ cd /;
Fix path (/data/library) to the a proper location at your computation enviroment
mkdir -p /data/library/rep_annotation; \ cd /data/library/rep_annotation; \ wget -c -t0 "http://files2.gersteinlab.org/public-docs/2019/08.14/rep_annotation.hg38.tar.bz2" -O rep_annotation.hg38.tar.bz2; \ tar xjvf rep_annotation.hg38.tar.bz2; \ rm -Rf rep_annotation.hg38.tar.bz2
mkdir -p /data/library/bowtie2; \ cd /data/library/bowtie2; \ wget -c -t0 "http://files2.gersteinlab.org/public-docs/2019/08.14/bowtie2.hg38.tar.bz2" -O bowtie2.hg38.tar.bz2; \ tar xjvf bowtie2.hg38.tar.bz2; \ rm -Rf bowtie2.hg38.tar.bz2
echo 'install.packages(c("penalized"), repos="http://cloud.r-project.org", dependencies=TRUE)' > /tmp/packages.R \ && Rscript /tmp/packages.R
A few paramaters must be setup so TeXP can properly work outside a docker enviroment; Parameters are set on opts.mk and the user MUST properly set it up.
Alternatively, docker images containing all dependencies and libraries can be used. The TeXP docker image also is pre-configured to work outside the box. Check https://hub.docker.com/r/fnavarro/texp/ for futher instructions: docker pull fnavarro/texp
$> ./TeXP.sh -f [FILE_NAME] -t [INT] -o [OUTPUT_PATH] n [SAMPLE_ID]
-f: Input file (fastq,fastq.gz,sra)
-t: Number of threads
-o: Output path (i.e. ./ or ./processed)
-n: Sample name (i.e. SAMPLE01)
1) Does TeXP work for paired end data?
TeXP has been implemented to run one fastq file at a time. Overall, we empirically find that if the RNA-seq library is good, P1 and P2 should yield very similar estimates. Therefore, if using paired-end RNA-seq data, we recommend calculating the mean between both pairs.
2) Does TeXP work for unstranded data?
Yes!
3) Can I use other aligners
On (figure 15) [https://journals.plos.org/ploscompbiol/article/file?type=supplementary&id=info:doi/10.1371/journal.pcbi.1007293.s015] we show that aligners do not drastically change TeXP estimations, therefore, while you could use other aligners, we suggest using bowtie2 since all TeXP parameterization has been done on bowtie2