bvieth / powsimR

Power analysis is essential to optimize the design of RNA-seq experiments and to assess and compare the power to detect differentially expressed genes. PowsimR is a flexible tool to simulate and evaluate differential expression from bulk and especially single-cell RNA-seq data making it suitable for a priori and posterior power analyses.
https://bvieth.github.io/powsimR/
Artistic License 2.0
103 stars 23 forks source link

What distribution do you assume for the mean and dispersion? #44

Closed ekernf01 closed 4 years ago

ekernf01 commented 4 years ago

Hi powsimR devs -- fantastic work on this tool! I looked for a while through the documentation and the paper for a while, but I am unable to find an answer to two key questions about the simulation.

Thanks!

ekernf01 commented 4 years ago

Also, for differentially expressed genes, is the fold change applied to the mean before selecting the dispersion, such that the dispersion differs between the control and treatment groups? Or is the control group's dispersion estimate used for the treatment group as well?

bvieth commented 4 years ago

Hello,

thank you for your interest in powsimR. Please find my answers to your questions below:

  • When you write "We first draw the mean expression for each gene", how exactly is this done? Is it drawn literally from the empirical distribution in this sense, i.e. sampled with replacement? So, the mean is sampled from the observed mean gene expression values estimated with estimateParam() which you can also visualize with plotParam(). Whether this will be then with or without replacement depends on relation between estimated and simulated genes defined with Setup(). If you want to simulate more genes (e.g. 15000) than estimated (e.g. 10000), then the sampling will be with replacement. The sampling is without replacement when there are more estimated mean expression values than genes to simulate. By setting the option verbose = TRUE, the functions should be "talkative", i.e. inform you what is going on.

  • You write "to capture the variability of the observed dispersion estimates, a local variability prediction band (σ = 1.96) is applied to the fit". Do you sample the dispersion uniformly within this 95% prediction band? No, I actually estimate the mean and standard deviation of the dispersion value given the mean expression value and then draw from a truncated normal distribution.

  • Also, for differentially expressed genes, is the fold change applied to the mean before selecting the dispersion, such that the dispersion differs between the control and treatment groups? Or is the control group's dispersion estimate used for the treatment group as well? The dispersion does not differ between the two groups.

I hope I could answer your questions. Do not hesitate to contact me again when you have any further questions.

Kind regards Beate

ekernf01 commented 4 years ago

That is exactly what I wanted to know. Thank you very much!