BMILAB / scAPAtrap

GNU Affero General Public License v3.0
15 stars 7 forks source link

scAPAtrap v0.2.0 (released on 2023/10/11)

Identification and quantification of alternative polyadenylation sites from single-cell RNA-seq data

Introduction

Alternative polyadenylation (APA) has been indicated to play an important role in regulating mRNA stability, translation and localization. Diverse scRNA-seq protocols, such as Drop-seq, CEL-seq, and 10x Genomics, utilizing 3' selection/enrichment in library construction, provide opportunities to extend bioinformatic analysis for studying APA at single cell resolution. We proposed a tool called scAPAtrap for the identification and quantification of APA sites in each individual cells by leveraging the resolution and huge abundance of scRNA-seq data generated by various 3' tag-based protocols. scAPAtrap incorporates peak identification and poly(A) read anchoring, which is capable of identifying and pinpointing poly(A) sites even for those with low read coverage. scAPAtrap can also quantify expression levels of all identified APA sites, considering duplicates resulted from both IVT and PCR cycles.

scAPAtrap mainly consists of six modules. (1) Raw scRNA-seq datasets from 3’ tag-based protocols (e.g., 10x, CEL-seq) were preprocessed for mapping and extracting UMIs. (2) Without using any genome annotation, potential peaks of the whole genome were detected and wide peaks were iteratively splitted into smaller ones. (3) Identified peaks are quantified by counting effective reads in the peaks. (4) Reads with A/T streches were extracted to determine precise locations of poly(A) sites. (5) Poly(A) sites in both genomic and intergenic regions were annotated with rich information according to the latest genome annotation. (6) Differentially expressed poly(A) sites and 3′ UTR lengthening/shortening events were detected to profile APA dynamics among cell types.

avatar

Prerequisites

Tools

The above tools can be installed using conda.

conda install samtools -c bioconda
conda install subread -c bioconda
conda install umi_tools -c bioconda
conda install star -c bioconda

R packages

Please install the following R packages: GenomicRanges GenomicAlignments dplyr derfinder regionR

Installing

install.packages('devtools')
devtools::install_github("BMILAB/scAPAtrap")

Or you may download the package and install locally.

devtools::install_local("your_path_of_scAPAtrap-master.zip", build_vignettes = TRUE)

When the package is installed, you can browse the vignette using the following command on the R console.

browseVignettes('scAPAtrap')

Application examples

Identification and quantification of poly(A) sites

scAPAtrap tutorial: PDF, HTML describes two ways of running scAPAtrap: one-step and step-by-step.

You can also browse the vignette using the following command on the R console

vignette("scAPAtrap_tutorial", package = "scAPAtrap")

Analyze APA results from scAPAtrap with the movAPA package

This documentation describes how to read an external file of poly(A) sites and analyze it with movAPA. The vignette was obtained from the movAPA package. Please refer to the vignette (PDF, HTML) for full details.

You can also browse the vignette using the following command on the R console

vignette("movAPA_on_scAPAtrap_results", package = "scAPAtrap")

Analysis in the BiB paper

Data

Three main datasets used in this study:

Lists of poly(A) sites with full genome annotation identified by scAPAtrap were placed in the Result folder.

Comparions with other tools

Here we adopted the mouse sperm scRNA-seq dataset to evaluate the performance of scAPAtrap and compared the results with other two tools, scAPA (Shulman et al, 2019) and Sierra (Patrick, et al., 2020). We have used scAPAtrap, scAPA, and Sierra to identify poly(A) sites from the mouse sperm scRNA-seq data, respectively. The identified poly(A) sites stored in Rdata files can be downloaded here. Please refer to the vignette (scAPAtrap_compare.html) for full details.

Analysis of APA dynamics

We analyzed dynamic APA usage during sperm cell differentiation based on poly(A) sites identified by scAPAtrap. Please refer to the vignette (scAPAtrap_DE.html) for full details.

Citation

If you are using scAPAtrap, please cite: Xiaohui Wu*, Tao Liu, Congting Ye, Wenbin Ye, Guoli Ji: scAPAtrap: identification and quantification of alternative polyadenylation sites from single-cell RNA-seq data, Briefings in Bioinformatics, 2020.