Closed carversh closed 1 year ago
Hi carversh,
yes, that is the best way to simulate data. For more details, take a look at this blog post: https://const-ae.name/post/2021-01-24-gamma-poisson-distribution/gamma-poisson-reference/
Best, Constantin
Hi Constantin,
When I simulate:
counts <- rnbinom(n = 10, mu = 5, size = 1/0.7)
# design = ~ 1 means that an intercept-only model is fit
fit <- glm_gp(counts, design = ~ 1)
The fit returns the following parameters: glmGamPoiFit object:
The data had 1 rows and 10 columns.
A model with 1 coefficient was fitted.
The design formula is: Y~1
Beta:
[,1]
Intercept 1.82
deviance: 10.7
overdispersion: 1.21
Shrunken quasi-likelihood overdispersion: 1
size_factors: FALSE
Mu:
Min 1st Qu. Median 3rd Qu. Max
6.2 6.2 6.2 6.2 6.2
Keep in mind that you are only sampling 10 values. Just by chance, you will sometimes see values which are a bit larger or a bit smaller than expected. Analogously, if you flip a coin 100 times, on average you expect 50 heads. However in reality you will end up with 43 or 52.
To check that glmGamPoi
does in deed produce the expected result on average, try the following example:
vals <- replicate(1000, {
counts <- rnbinom(n = 10, mu = 10, size = 1/0.7)
fit <- glmGamPoi::glm_gp(counts, design = ~ 1)
fit$Mu[1,1]
})
hist(vals); abline(v = mean(vals))
Created on 2023-08-14 with reprex v2.0.2
This is great -- totally makes sense. Thank you!
And just to confirm, the overdispersion will never equal the simulated 1/size because of the shrinking parameter? Setting the shrinkage to zero, should resolve this
It's just the same. On average, you recover the correct parameter. This isn't related to shrinkage.
vals <- replicate(5000, {
counts <- rnbinom(n = 10, mu = 10, size = 1/0.7)
fit <- glmGamPoi::glm_gp(counts, design = ~ 1)
fit$overdispersions
})
hist(vals, breaks = 30); abline(v = mean(vals)); abline(v = 0.7, col = "red")
Created on 2023-08-14 with reprex v2.0.2
The true overdispersion is 1/0.7 = 1.41, while the fitted overdispersion is 0.697.
Ah, I see. The misunderstanding comes from the definition of the overdispersion. I define overdispersion as 1/size. That means that the model converges to 0.7
Got it thanks!
I am fitting the model on a gene x cell matrix, and want to simulate from the fitted model. Do I convert the overdispersion and Mu parameters to fit the negative binomial parameters and use the nbinom function to simulate?