const-ae / glmGamPoi

Fit Gamma-Poisson Generalized Linear Models Reliably
105 stars 15 forks source link

Can you simulate from the fitted model or do you have recommendations for simulation? #53

Closed carversh closed 1 year ago

carversh commented 1 year ago

I am fitting the model on a gene x cell matrix, and want to simulate from the fitted model. Do I convert the overdispersion and Mu parameters to fit the negative binomial parameters and use the nbinom function to simulate?

const-ae commented 1 year ago

Hi carversh,

yes, that is the best way to simulate data. For more details, take a look at this blog post: https://const-ae.name/post/2021-01-24-gamma-poisson-distribution/gamma-poisson-reference/

Best, Constantin

carversh commented 1 year ago

Hi Constantin,

When I simulate:

counts <- rnbinom(n = 10, mu = 5, size = 1/0.7)

# design = ~ 1 means that an intercept-only model is fit
fit <- glm_gp(counts, design = ~ 1)

The fit returns the following parameters: glmGamPoiFit object:

The data had 1 rows and 10 columns.
A model with 1 coefficient was fitted.
The design formula is: Y~1

Beta:
          [,1]
Intercept 1.82

deviance: 10.7

overdispersion: 1.21

Shrunken quasi-likelihood overdispersion: 1

size_factors: FALSE

Mu:
 Min 1st Qu. Median 3rd Qu. Max
 6.2     6.2    6.2     6.2 6.2
const-ae commented 1 year ago

Keep in mind that you are only sampling 10 values. Just by chance, you will sometimes see values which are a bit larger or a bit smaller than expected. Analogously, if you flip a coin 100 times, on average you expect 50 heads. However in reality you will end up with 43 or 52.

To check that glmGamPoi does in deed produce the expected result on average, try the following example:

vals <- replicate(1000, {
  counts <- rnbinom(n = 10, mu = 10, size = 1/0.7)
  fit <- glmGamPoi::glm_gp(counts, design = ~ 1)
  fit$Mu[1,1]
})
hist(vals); abline(v = mean(vals))

Created on 2023-08-14 with reprex v2.0.2

carversh commented 1 year ago

This is great -- totally makes sense. Thank you!

carversh commented 1 year ago

And just to confirm, the overdispersion will never equal the simulated 1/size because of the shrinking parameter? Setting the shrinkage to zero, should resolve this

const-ae commented 1 year ago

It's just the same. On average, you recover the correct parameter. This isn't related to shrinkage.

vals <- replicate(5000, {
  counts <- rnbinom(n = 10, mu = 10, size = 1/0.7)
  fit <- glmGamPoi::glm_gp(counts, design = ~ 1)
  fit$overdispersions
})
hist(vals, breaks = 30); abline(v = mean(vals)); abline(v = 0.7, col = "red")

Created on 2023-08-14 with reprex v2.0.2

const-ae commented 1 year ago

The true overdispersion is 1/0.7 = 1.41, while the fitted overdispersion is 0.697.

Ah, I see. The misunderstanding comes from the definition of the overdispersion. I define overdispersion as 1/size. That means that the model converges to 0.7

carversh commented 1 year ago

Got it thanks!