TDEseq is implemented as an open source R package for detecting genes with temporal dynamic expression patterns in time-series scRNA-seq transcriptomic studies. TDEseq primarily builds upon the linear additive mixed model (LAMM) framework, with a random effect term to account for correlated samples in time-resolved or time-course scRNA-seq studies. In this model, we typically introduce the quadratic I-splines and cubic C-splines as basis functions, which facilitate the detection of four potential temporal gene expression patterns, i.e., growth, recession, peak, and trough. This vignette will illustrate some uses and visulization of TDEseq.
TDEseq is implemented as an R package, which can be installed from GitHub.
library(devtools)
install_github("fanyue322/TDEseq")
The main function is TDEseq. You can find the instructions and an example by '?tdeseq.default'.
We demonstrate the use of TDEseq to an example simulated time course scRNA-seq data that are here, which are included in the TDEseq package. This toy example is used for testing purposes only:
data(exampledata)
seurat
#> An object of class Seurat
#> 200 features across 1246 samples within 1 assay
#> Active assay: RNA (200 features, 0 variable features)
We show how to create a TDEseqObject object. We can create a TDEseqObject using the count matrix and meta data. Although we provide normalize function to perform log normalization for raw counts data, we recommend the users provided their own normalized scRNA-seq data. The time points information and sample information (for mixed model only) should be contained in the meta.data.
counts<-Seurat::GetAssayData(seurat,'counts') ##raw counts data
norm.data<-Seurat::GetAssayData(seurat,'data') ##log normalized data
meta.data<-seurat@meta.data ##metadata
tde <- CreateTDEseqObject(counts = counts, data=norm.data, meta.data = meta.data)
Note: the time points and sample information must be contained in the meta data.
If only log normalized data is available, users can also create TDEseqObject as:
tde <- CreateTDEseqObject(counts = norm.data, data=norm.data, meta.data = meta.data)
Alternatively, TDEseqObject can be created directly from a Seurat object
tde <- CreateTDEseqObject(counts = seurat)
or a sce object
sce <- SingleCellExperiment::SingleCellExperiment(list(counts=counts,logcounts=data.norm),
colData=data.frame(label=colnames(counts)),
rowData=data.frame(length=rownames(counts)),
metadata=list(study="GSE111111"))
tde <- CreateTDEseqObject(counts = sce)
Add the parameter setting of TDEseq.
tde_method <- "cell"
tde_param <- list(sample.var = "batch",
stage.var = "stage",
fit.model = "lm",
tde.thr = 0.05)
tde <- tdeseq(object = tde, tde.method=tde_method, tde.param=tde_param, num.core=1)
Users need to specify which column in the meta.data corresponds to sample and time points information by setiing sample.var
and stage.var
. We set fit.model="lm"
to perform linear version of TDEseq. Uesr can perform mixed version of TDEseq by setting fit.model="lmm"
.
User can set other parameters to perform some preprocessing steps. This parameter will do four things:
tde_param <- list(sample.var = "batch",
stage.var = "stage",
fit.model = "lmm",
pct = 0.1,
tde.thr = 0.05,
lfc = 0.1,
max.gcells = Inf,
min.tcells = 3,
mod = 'FastLMM')
tde <- tdeseq(object = tde, tde.param=tde_param,num.core=1)
min.tcells
. Here, time points with less than 3 cells will be removed.pct
. Here, genes with more than 90% of zero counts will be filtered out.lfc
. Here, we limit testing to genes which show at least 0.1-fold difference between any two time points.max.gcells
. If max.gcells is smaller than the given number of cells in a sample, the down-sampling will be active. Here, we do not perform downsampling by setting max.gcells=Inf
.mod
. We strongly recommend using FastLMM
mod, which will estimate random effects efficient and accurate.The results of TDEseq analysis are stored in TDEseqObject. User can obtain the results by
## Get the results of TDEseq analysis for each gene
result<-GetTDEseqAssayData(tde,slot='tde')
When the number of cells is large, generating pseudo-cells by aggregating a predifined number of cells (default is 20) will greatly reduce the computational burden. Users can run TDEseq in pseudocell mode by setting tde_method = "pseudocell"
tde_method <- "pseudocell"
tde_param <- list(sample.var = "batch",
stage.var = "stage",
fit.model = "lm",
tde.thr = 0.05)
tde <- tdeseq(object = tde, tde.method=tde_method, tde.param=tde_param)
Note, to run TDEseq in Pseudocell mode, please first install Seurat package.
A tutorial includes main example codes for mouse liver development analysis can be found here