bvieth / powsimR

Power analysis is essential to optimize the design of RNA-seq experiments and to assess and compare the power to detect differentially expressed genes. PowsimR is a flexible tool to simulate and evaluate differential expression from bulk and especially single-cell RNA-seq data making it suitable for a priori and posterior power analyses.
https://bvieth.github.io/powsimR/
Artistic License 2.0
103 stars 23 forks source link

The gene number simulated by powsimR is not equal to the custom setting #61

Open duohongrui opened 2 years ago

duohongrui commented 2 years ago

Hi, PowsimR is an ideal tool to simulate single-cell RNA-seq data where DEGs can be set previously between two groups and it is useful for my project. I set the gene number(e.g. 43718) according to the real data and ran the simulation. However, the output count matrix only involves 43366 genes, showing a little difference.

Codes are shown here:

params <- estimateParam(countData = counts(ref_data),
                        RNAseq = 'singlecell',
                        Protocol = 'UMI',
                        Distribution = 'ZINB',
                        Normalisation = "scran",
                        verbose = TRUE)

# set up simulations
setupres <- Setup(ngenes = dim(ref_data)[1],
                  nsims = 1,
                  n1 = 30,
                  n2 = 30,
                  estParamRes = params,
                  setup.seed = seed,
                  verbose = TRUE)

## Running differential expression simulations
sim_data <- simulateDE(SetupRes = setupres,
                       Normalisation = 'scran',
                       DEmethod = "MAST",
                       verbose = TRUE,
                       Counts = TRUE))

sim_data <- sim_data[["Counts"]][[1]][[1]]

dim(ref_data)[1]
## 43718

dim(sim_data)[1]
## 43366

How can I solve this problem? Thanks very much!

bvieth commented 11 months ago

Hey,

this is a result of drawing from the count distribution with very small mean expression values, some of them will randomly have zero counts across all samples, which are removed from the final output object.

HTH Beate