SONGDONGYUAN1994 / scDesign3

scDesign3 generates realistic in silico data for multimodal single-cell and spatial omics
https://songdongyuan1994.github.io/scDesign3/docs/index.html
MIT License
86 stars 24 forks source link

scDesign3


The R package scDesign3 is an all-in-one single-cell data simulation tool by using reference datasets with different cell states (cell types, trajectories or and spatial coordinates), different modalities (gene expression, chromatin accessibility, protein abundance, DNA methylation, etc), and complex experimental designs. The transparent parameters enable users to alter models as needed; the model evaluation metrics (AIC, BIC) and convenient visualization function help users select models. Detailed tutorials that illustrate various functionalities of scDesign3 are available at this website. The following illustration figure summarizes the usage of scDesign3:

To find out more details about scDesign3, you can check out our manuscript on Nature Biotechnology:

Song, D., Wang, Q., Yan, G. et al. scDesign3 generates realistic in silico data for multimodal single-cell and spatial omics. Nat Biotechnol 42, 247–252 (2024).

The computational time is quadratic to the number of features used in copula modeling. Reducing this number will greatly speed up the calculation.

Please note that the parallel computing of scDesign3 is mainly designed for UNIX OS; be careful when you set n_cores. Please note that you should consider the balance between n_cores and your ROM (memory). Simply increasing the number of cores without the increase of memory will slow down or froze your program. We recommend that you should allocate at least 1 GB for 1 core.

Table of contents

  1. Installation
  2. Quick Start
  3. Tutorials
  4. Contact
  5. Related Manuscripts

Installation

To install the development version from GitHub, please run:

if (!require("devtools", quietly = TRUE))
    install.packages("devtools")
devtools::install_github("SONGDONGYUAN1994/scDesign3")

We are now working on submitting it to Bioconductor and will provide the link once online.

Quick Start

The following code is a quick example of running our simulator. The function scdesign3() takes in a SinglecellExperiment object with the cell covariates(such as cell types, pseudotime, or spatial coordinates) stored in the colData of the SinglecellExperiment object. For more details on the SinlgeCellExperiment object, please check on its Bioconductor link.

example_simu <- scdesign3(
    sce = example_sce,
    assay_use = "counts",
    celltype = "cell_type",
    pseudotime = "pseudotime",
    spatial = NULL,
    other_covariates = NULL,
    mu_formula = "s(pseudotime, k = 10, bs = 'cr')",
    sigma_formula = "s(pseudotime, k = 5, bs = 'cr')",
    family_use = "nb",
    n_cores = 2,
    correlation_function = "default",
    usebam = FALSE,
    corr_formula = "1",
    copula = "gaussian",
    fastmvn = FALSE,
    DT = TRUE,
    pseudo_obs = FALSE,
    family_set = c("gauss", "indep"),
    important_feature = "all",
    nonnegative = TRUE,
    return_model = FALSE,
    nonzerovar = FALSE,
    parallelization = "mcmapply",
    BPPARAM = NULL,
    trace = FALSE
  )

The parameters of scdesign3() are:

The output of scdesign3() is a list which includes:

For more details about the mu_formula and sigma_formula formula specification, please check online materials about the package mgcv. Technically speaking, you can try any formulas as long as they are available for mgcv.

Tutorials

For all detailed tutorials, please check the website. The tutorials will demonstrate the applications of scDesign3 from the following four perspectives: data simulation, model parameters, model selection, and model alteration.

Contact

Any questions or suggestions on scDesign3 are welcomed! Please report it on issues, or contact Dongyuan Song (dongyuansong\@ucla.edu{.email}) or Qingyang Wang (qw802\@g.ucla.edu{.email}).

Changelog

Related Manuscripts