GaryBAYLOR / R-code

A collection of algorithms written by myself for solving statistical problems
0 stars 0 forks source link

MH algorithm for polynomial regression #22

Open GaryBAYLOR opened 8 years ago

GaryBAYLOR commented 8 years ago

We will write a program in R to estimate coefficients of a polynomial regression. The data we use is cars in R. The response is dist and the predictor is speed. We will fit a polynomial regression to two degrees.

> head(cars)
  speed dist
1     4    2
2     4   10
3     7    4
4     7   22
5     8   16
6     9   10

If we use frequentist's method. We can use lm in R to fit a model and the following is the graph.

y <- cars$dist
x1 <- cars$speed
x2 <- (cars$speed)^2
mod <- lm(y ~ x1 + x2)
pred <- predict(mod, data.frame(x1, x2))
plot(x1, y, xlab = "speed", ylab = "dist")
xseq <- seq(min(x1), max(x1), length = 100)
lines(x1, pred, type = "l", col = "red", lwd = 2)

image

Now we use Bayesian method. The procedure is like this. We choose some priors for each coefficients, and use MH algorithm to generate a MCMC chain and then estimate the posterior distribution. Here we using a normal distribution for each coefficient and a uniform distribution for sigma2 -- the variance of the error. The code for each function is like this.

##  MH algorithm
#   define likelihood
likelihood <- function(param){
    # X is design matrix with first column all 1s
    params <- matrix(param[-4])     
    pred = X %*% params
    pred <- as.vector(pred)
    singlelikelihoods = dnorm(y, mean = pred, sd = param[4], log = T)
    sumll = sum(singlelikelihoods)
    return(sumll)   
}

#   define prior
prior <- function(param){
    b0 = param[1]
    b1 = param[2]
    b2 = param[3]
    sd = param[4]
    aprior = dnorm(b0, mean=0, sd=10, log = T)
    bprior = dnorm(b1, mean=0, sd = 5, log = T)
    cprior = dnorm(b2, mean=0, sd = 1, log = T)
    sdprior = dunif(sd, min=0, max=1000, log = T)
    return(aprior + bprior + cprior + sdprior)
}

# define posterior
posterior <- function(param){
   return (likelihood(param) + prior(param))
}

# define proposal function
proposalfunction <- function(param, theta = c(0.5,0.5,0.3, 1)){
    return(runif(4, min = param - theta, max = param + theta))
}

##  MH
MH <- function(startvalue, N = 1e5, burn = N/10){
    chain = array(dim = c(N,4))
    chain[1,] = startvalue
    for (i in 2:N){
        proposal = proposalfunction(chain[i-1,])

        rho = exp(posterior(proposal) - posterior(chain[i-1,]))
        if (runif(1) < rho){
            chain[i,] = proposal
        }else{
            chain[i,] = chain[i-1,]
        }
    }
    chain <- chain[-(1:burn), ]

    par(mfrow = c(2, 2))
    # plot(density(chain[, 1]), type = "l", main = "beta0")
    # plot(density(chain[, 2]), type = "l", main = "beta1")
    # plot(density(chain[, 3]), type = "l", main = "beta2")
    # plot(density(chain[, 4]), type = "l", main = "sigma2")

    hist(chain[, 1], freq = FALSE, breaks = 50, main = "beta0")
    hist(chain[, 2], freq = FALSE, breaks = 50, main = "beta1")
    hist(chain[, 3], freq = FALSE, breaks = 50, main = "beta2")
    hist(chain[, 4], freq = FALSE, breaks = 50, main = "sigma2")

    invisible(chain)
}

Next we show a simulation result with 5e5 times of simulations. The posterior distribution of each parameter is like the following graph. image We can compare this with the model built with lm.

> mod <- lm(y ~ x1 + x2^2)
> summary(mod)

Call:
lm(formula = y ~ x1 + x2)

Residuals:
    Min      1Q  Median      3Q     Max 
-28.720  -9.184  -3.188   4.628  45.152 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept)  2.47014   14.81716   0.167    0.868
x1           0.91329    2.03422   0.449    0.656
x2           0.09996    0.06597   1.515    0.136

Residual standard error: 15.18 on 47 degrees of freedom
Multiple R-squared:  0.6673,    Adjusted R-squared:  0.6532 
F-statistic: 47.14 on 2 and 47 DF,  p-value: 5.852e-12

The posterior distribution of beta0 is pretty unstable while the others are stable.

How does the MH algorithm perform? Check the following graph, which shows the prediction of Bayesian model for randomly selected 100 posterior data points from the last 1000 simulations after removing repetitions.

image

If we increase the simulation size to 1e6, the result will be improved a lot,

image image

A problem is that: Why the acceptance rate is so low (0.03), and how to improve it?

We see that MH algorithm is not bad. The above discuss is just my draft idea, without deliberate finalization.