Vivianstats / scDesign

A statistical simulator for rational scRNA-seq experimental design
https://academic.oup.com/bioinformatics/article/35/14/i41/5529133
11 stars 4 forks source link
benchmarking experimental-design scrna-seq simulation

scDesign: a statistical simulator for rational scRNA-seq experimental design

Wei Vivian Li 2020-12-13

Latest News

2020/12/13: Version 1.1.0 released!

2019/03/18: Version 1.0.0 released!

Introduction

Any suggestions on the package are welcome! For technical problems, please report to Issues. For suggestions and comments on the method, please contact Dr. Vivian Li (vivian.li@rutgers.edu) or Dr. Jessica Li (jli@stat.ucla.edu).

Installation

The package is not on CRAN yet. For installation please use the following codes in R

install.packages("devtools")
library(devtools)

install_github("Vivianstats/scDesign")

Quick start

scDesign has three main functions:

For detailed usage, please refer to the package manual or vignette.

design_data

design_data simulates additional scRNA-seq data by estimating gene expression parameters from a real scRNA-seq dataset. When ngroup = 1, it each time simulates a single dataset based on user-specified total read number S and cell number ncell.

realcount1 = readRDS(system.file("extdata", "astrocytes.rds", package = "scDesign"))
simcount1 = design_data(realcount = realcount1, S = 1e7, ncell = 1000, ngroup = 1, ncores = 1)

realcount1[1:3, 1:3]
#>              GSM1657885 GSM1657932 GSM1657938
#> 1/2-SBSRNA4           0          0          0
#> A2M                   0          0         34
#> A2ML1                 0          0         25
simcount1[1:3, 1:3]
#>       cell1 cell2 cell3
#> gene1     0     0     0
#> gene2     0     0    68
#> gene3     0     0     1

When ngroup > 1, it simulates ngroup datasets following a specified differentiation path.

simdata = design_data(realcount = realcount1, S = rep(1e7,3), ncell = rep(100,3), ngroup = 3, 
                      pUp = 0.03, pDown = 0.03, fU = 3, fL = 1.5, ncores = 1)

# simdata is a list of three elements
names(simdata) 
#> [1] "count"     "genesUp"   "genesDown"

# count matrix of the cell state 2
simdata$count[[2]][1:3, 1:3] 
#>       C2_1 C2_2 C2_3
#> gene1  132    0    0
#> gene2    6    2    6
#> gene3    0    0    0

# up-regulated genes from state 1 to state 2
simdata$genesUp[[2]][1:3] 
#> [1] "gene1655" "gene614"  "gene6057"

# down-regulated genes from state 1 to state 2
simdata$genesDown[[2]][1:3] 
#> [1] "gene1958" "gene4631" "gene4888"

design_sep

design_sep assists experimental design by selecting the optimal cell numbers for the two cell states in scRNA-seq, so that the subsequent DE analysis becomes most accurate based on the user-specified criterion. It assumes that cells from the two states are prepared as two separate libraries and sequenced independently.

realcount1 = readRDS(system.file("extdata", "astrocytes.rds", package = "scDesign"))
realcount2 = readRDS(system.file("extdata", "oligodendrocytes.rds", package = "scDesign"))

# candidate cell numbers for experimental design
ncell = cbind(2^seq(6,11,1), 2^seq(6,11,1))
prlist = design_sep(realcount1, realcount2, ncell = ncell, de_method = "ttest", ncores = 10)

# returns a list of five elements
names(prlist)
#> precision  recall  TN  F1  F2
prlist$precision
#> p_thre 64vs64 128vs128 256vs256 512vs512 1024vs1024 2048vs2048
#> 0.01   0.332  0.272    0.178    0.121    0.084      0.056
#> 0.001  0.448  0.361    0.231    0.147    0.097      0.063
#> 1e-04  0.532  0.434    0.282    0.175    0.11       0.07
#> 1e-05  0.599  0.491    0.331    0.203    0.124      0.076
#> 1e-06  0.649  0.534    0.375    0.231    0.138      0.083

design_sep also saves the analysis results to a txt file and a set of power analysis plots.

design_joint

design_joint assists experimental design by selecting the optimal (total) cell number for a cell population that contains the two cell states of interest, so that the subsequent DE analysis becomes most accurate based on the user-specified criterion. It assumes that cells from the two states are prepared in the same library and sequenced together.

# candidate cell numbers for experimental design
ncell = round(2^seq(9,13,1))
prlist = design_joint(realcount1, realcount2, prop1 = 0.2, prop2 = 0.15,
                      ncell = ncell, de_method = "ttest", ncores = 10)

# returns a list of five elements
names(prlist)
#> precision  recall  TN  F1  F2
prlist$recall
#>       512   1024  2048  4096  8192
#> 0.01  0.315 0.33  0.259 0.176 0.111
#> 0.001 0.235 0.281 0.24  0.169 0.108
#> 1e-04 0.176 0.236 0.22  0.162 0.105
#> 1e-05 0.133 0.198 0.2   0.155 0.102
#> 1e-06 0.103 0.166 0.181 0.147 0.099

design_joint also saves the analysis results to a txt file and a set of power analysis plots.

Citation

Li, Wei Vivian, and Jingyi Jessica Li. "A statistical simulator scDesign for rational scRNA-seq experimental design." Bioinformatics 35, no. 14 (2019): i41-i50. Link