Standard error of the overdispersion estimates

const-ae / glmGamPoi

Fit Gamma-Poisson Generalized Linear Models Reliably

105 stars 15 forks source link

Standard error of the overdispersion estimates #50

Open anglixue opened 1 year ago

anglixue commented 1 year ago

Dear Constantin,

I was wondering when estimating the overdispersion by glmGamPoi::glm_gp() function, can we extract the standard error of the estimates?

Also, when the algorithm shrinks the dispersion, would that impact the standard error?

Thank you!

Cheers, Angli

const-ae commented 1 year ago

I was wondering when estimating the overdispersion by glmGamPoi::glm_gp() function, can we extract the standard error of the estimates?

I currently have not implemented any way to directly extract the standard error of the overdispersion.

glmGamPoi is inspired by limma and there was some discussion how to estimate the standard error of the variance in limma, which might be helpful.

Also, when the algorithm shrinks the dispersion, would that impact the standard error?

Yes, applying empirical Bayesian shrinkage to the overdispersion would also change its standard error.

I am curious, what is the case for which you need the standard error of the variance?

anglixue commented 1 year ago

Thanks for your reply.

I noticed that even Cox-Reid adjusted MLE still has slightly inflated estimates for dispersion. The McCarthy et al. (2012) showed that CR-MLE is the least biased method in GLM but the simulating settings are specific to the data. I didn't find how many replicates they simulated per gene, but when the simulated mean and number of replicates are small, let's say dispersion=0.4, mean = 0.1, 100 replicates, the CR-MLE estimates could return very large estimates for dispersion (>100).

I want to test the lower limit where CR-MLE provides unbiased estimates. So knowing how to estimate the standard error will be essential. My gut feeling is that applying empirical Bayesian shrinkage won't help in this case as the gene-wise dispersion will not be consistent in the real data.

const-ae commented 1 year ago

Oh, interesting. It has been 3 years since I really deeply looked into this topic, so I am afraid I cannot give much useful input.

Regarding the question how to get the variance of the overdispersion estimate, I would approach this by using the conventional_deriv_score_function_fast to calculate the Hessian of the log(overdispersion) estimate (see for an example here). You just need the mean vector, the counts and model matrix.

anglixue commented 1 year ago

Thanks for your suggestion. I'll try that.