cchanialidis / combayes

Bayesian COM-Poisson regression for count data
0 stars 0 forks source link

combayes

combayes implements Bayesian inference for COM-Poisson regression models using exact samplers. It also provides functions for sampling exactly from the COM-poisson distribution (using rejection sampling) and for evaluating exact bounds for the normalisation constant of the probability mass function of the COM-Poisson distribution. More information behind the techniques used can be found in the papers:

Both papers focus on the Bayesian implementation of the COM-Poisson regression model. The latter paper takes advantage of the exchange algorithm, an MCMC method applicable to situations where the sampling model (likelihood) can only be computed up to a normalisation constant. The algorithm requires to draw from the sampling model, which in the case of the COM-Poisson distribution can be done efficiently using rejection sampling.

If you want to get a really short intro to the COM-Poisson distribution enter the URL of the README.html document at http://htmlpreview.github.io/.

Are you still confused about the COM-Poisson distribution and the exchange algorithm? Have a look at these slides.

Installing the package in R

library(devtools)
install_github("cchanialidis/combayes")

All you need to do now, I hope, is

library(combayes)

Sampling from COM-Poisson distributions with different dispersion levels

# Set random seed for reproducibility
set.seed(84)
# Sample size
n <- 200 
# Sampling from an underdispersed COM-Poisson distribution
comp_under <- rcmpois(mu=10,nu=2,n=n)
# Sampling from a COM-Poisson distribution where nu=1 (i.e. Poisson distribution)
comp_equi <- rcmpois(mu=10,nu=1,n=n)
# Sampling from an overdispersed COM-Poisson distribution
comp_over <- rcmpois(mu=10,nu=0.5,n=n)
# Save samples in a data frame
distributions <- data.frame(comp_under,comp_equi,comp_over)
apply(distributions,2,mean)# Similar means (close to the value of mu)
## comp_under  comp_equi  comp_over 
##      9.855     10.105     10.480
apply(distributions,2,var)# Different variances (close to the value of mu/nu)
## comp_under  comp_equi  comp_over 
##   4.757764   8.385905  19.647839

Estimating the logarithm of the normalisation constant

logzcmpois(mu=10,nu=2)
logzcmpois(mu=10,nu=1)# Any ideas on what we expect the answer to be for nu=1?
logzcmpois(mu=10,nu=0.5)

Estimating the probability mass function

# Compare densities of COM-Poisson distribution with different nu
 x <- 0:25

matplot(x, cbind(dcmpois(x, mu=10, nu=0.5),
                 dcmpois(x, mu=10, nu=1),
                  dcmpois(x, mu=10, nu=2)), type="o", col=2:4, pch=16, ylab="Probability mass function") 

legend("topright", col=2:4, lty=1:3, c(expression(nu*"="*0.5),
                                        expression(nu*"="*1),
                                        expression(nu*"="*2)))

COM-Poisson regression model

We illustrate the method and the benefits of using a Bayesian COM-Poisson regression model, through two real-world data sets with different levels of dispersion. If one wants to use the alternative technique proposed in the earlier paper they have to specify that the argument algorithm in the cmpoisreg is equal to "bounds" (in its default version is equal to "exchange"). Bear in mind that the MCMC algorithm will be significantly slower in that case.

Application I: To be updated soon

Application II: PhD publications data

# Load data from library Rchoice
library(Rchoice)
data(Articles)
# Focusing only on the students with at least one publication
phdpublish <- subset(Articles, art>0)
phdpublish <- transform(phdpublish, art=art-1)
# Standardise all non-binary covariates
phdpublish <- cbind(phdpublish[,c(1,2,3)],scale(phdpublish[,-c(1,2,3)],center=TRUE,scale=TRUE))
result <- cmpoisreg(y=phdpublish$art, X=phdpublish[,2:6], num_samples=1e4, burnin=1e3,prior_var_beta=diag(6),prior_var_delta=diag(6))
colMeans(result$posterior_beta)
colMeans(result$posterior_delta)
mcmc_beta  <- mcmc(result$posterior_beta)
mcmc_delta <- mcmc(result$posterior_delta)
colnames(mcmc_beta) <- c("intercept","female","married","kids","phd","mentor")
colnames(mcmc_delta) <- colnames(mcmc_beta)
# Plot traceplots of regression coefficients
plot(mcmc_beta)
plot(mcmc_delta)
# Plot caterplots of regression coefficients
caterplot(mcmc_beta,style="plain",bty="n",collapse=FALSE)
abline(v=0,lty=2)
title("Regression coefficients for"~ mu)
caterplot(mcmc_delta,style="plain",bty="n",collapse=FALSE)
abline(v=0,lty=2)
title("Regression coefficients for"~ nu)

TMI

COM-Poisson

A talk by Galit Shmueli on the COM-Poisson distribution can be found here.

If you prefer reading papers instead, here are some starting points:

Regression models for count data in R

I have to add a lot of info on this subsection but until then; here is a really informative paper that outlines the theory and the implementation of regression models for count data in R.

Finally, Joseph M. Hilbe released the COUNT package which contains lots of data sets where the response variable is discrete.