combayes implements Bayesian inference for COM-Poisson regression models using exact samplers. It also provides functions for sampling exactly from the COM-poisson distribution (using rejection sampling) and for evaluating exact bounds for the normalisation constant of the probability mass function of the COM-Poisson distribution. More information behind the techniques used can be found in the papers:
Both papers focus on the Bayesian implementation of the COM-Poisson regression model. The latter paper takes advantage of the exchange algorithm, an MCMC method applicable to situations where the sampling model (likelihood) can only be computed up to a normalisation constant. The algorithm requires to draw from the sampling model, which in the case of the COM-Poisson distribution can be done efficiently using rejection sampling.
If you want to get a really short intro to the COM-Poisson distribution enter the URL of the README.html document at http://htmlpreview.github.io/.
Are you still confused about the COM-Poisson distribution and the exchange algorithm? Have a look at these slides.
library(devtools)
install_github("cchanialidis/combayes")
All you need to do now, I hope, is
library(combayes)
# Set random seed for reproducibility
set.seed(84)
# Sample size
n <- 200
# Sampling from an underdispersed COM-Poisson distribution
comp_under <- rcmpois(mu=10,nu=2,n=n)
# Sampling from a COM-Poisson distribution where nu=1 (i.e. Poisson distribution)
comp_equi <- rcmpois(mu=10,nu=1,n=n)
# Sampling from an overdispersed COM-Poisson distribution
comp_over <- rcmpois(mu=10,nu=0.5,n=n)
# Save samples in a data frame
distributions <- data.frame(comp_under,comp_equi,comp_over)
apply(distributions,2,mean)# Similar means (close to the value of mu)
## comp_under comp_equi comp_over
## 9.855 10.105 10.480
apply(distributions,2,var)# Different variances (close to the value of mu/nu)
## comp_under comp_equi comp_over
## 4.757764 8.385905 19.647839
logzcmpois(mu=10,nu=2)
logzcmpois(mu=10,nu=1)# Any ideas on what we expect the answer to be for nu=1?
logzcmpois(mu=10,nu=0.5)
# Compare densities of COM-Poisson distribution with different nu
x <- 0:25
matplot(x, cbind(dcmpois(x, mu=10, nu=0.5),
dcmpois(x, mu=10, nu=1),
dcmpois(x, mu=10, nu=2)), type="o", col=2:4, pch=16, ylab="Probability mass function")
legend("topright", col=2:4, lty=1:3, c(expression(nu*"="*0.5),
expression(nu*"="*1),
expression(nu*"="*2)))
We illustrate the method and the benefits of using a Bayesian
COM-Poisson regression model, through two real-world data sets with
different levels of dispersion. If one wants to use the alternative
technique proposed in the earlier paper they have to specify that the
argument algorithm
in the cmpoisreg
is equal to "bounds"
(in its
default version is equal to "exchange"
). Bear in mind that the MCMC
algorithm will be significantly slower in that case.
# Load data from library Rchoice
library(Rchoice)
data(Articles)
# Focusing only on the students with at least one publication
phdpublish <- subset(Articles, art>0)
phdpublish <- transform(phdpublish, art=art-1)
# Standardise all non-binary covariates
phdpublish <- cbind(phdpublish[,c(1,2,3)],scale(phdpublish[,-c(1,2,3)],center=TRUE,scale=TRUE))
result <- cmpoisreg(y=phdpublish$art, X=phdpublish[,2:6], num_samples=1e4, burnin=1e3,prior_var_beta=diag(6),prior_var_delta=diag(6))
colMeans(result$posterior_beta)
colMeans(result$posterior_delta)
mcmc_beta <- mcmc(result$posterior_beta)
mcmc_delta <- mcmc(result$posterior_delta)
colnames(mcmc_beta) <- c("intercept","female","married","kids","phd","mentor")
colnames(mcmc_delta) <- colnames(mcmc_beta)
# Plot traceplots of regression coefficients
plot(mcmc_beta)
plot(mcmc_delta)
# Plot caterplots of regression coefficients
caterplot(mcmc_beta,style="plain",bty="n",collapse=FALSE)
abline(v=0,lty=2)
title("Regression coefficients for"~ mu)
caterplot(mcmc_delta,style="plain",bty="n",collapse=FALSE)
abline(v=0,lty=2)
title("Regression coefficients for"~ nu)
A talk by Galit Shmueli on the COM-Poisson distribution can be found here.
If you prefer reading papers instead, here are some starting points:
I have to add a lot of info on this subsection but until then; here is a really informative paper that outlines the theory and the implementation of regression models for count data in R.
Finally, Joseph M. Hilbe released the COUNT package which contains lots of data sets where the response variable is discrete.