GaryBAYLOR / R-code

A collection of algorithms written by myself for solving statistical problems
0 stars 0 forks source link

Performance of Wald confidence interval for relative risk #8

Open GaryBAYLOR opened 9 years ago

GaryBAYLOR commented 9 years ago

The performance of Wald confidence interval for relative risk

The relative risk is important in statistics in comparing the probabilities of binary events. It is also easier to interpret statistical knowledge to general public. For example, people will easily understand

The probability of getting lung cancer is ten times higher for people who smoke at least 1 cigarettes every day than those who don't smoke at all.

But it is confused if people are told

The odds ratio of people who smoke at least 1 cigarettes every day and those who don't smoke at all is 15.

When calculating the confidence intervals of relative risk, it turns out the log relative risk has better asymptotic properties than relative risk itself. The following R code is for testing the performance of Wald confidence interval for log relative risk.
##  This is to calculate the coverage of Wald C.I. of relative risk of 2 by 2 table
##  --------------------------------------------------------------
##                          Lung Cancer     No Lung Cancer
##  Smoke                       n11                             n12
##  Non-Smoke               n21                             n22
##  --------------------------------------------------------------

#       This function just return TRUE or FALSE to indicate whether the C.I. 
#       will cover the real relative risk or not
RRcover_or_not <- function(p1 = 0.007, p2 = 0.003, n1=5000, n2 = 10000) {
    n11 <- rbinom(1, size = n1, prob = p1)
    n21 <- rbinom(1, size = n2, prob = p2)
    n12 <- n1 - n11
    n22 <- n2 - n21

    p1hat <- n11/n1
    p2hat <- n21/n2
    logrr <- log(p1hat / p2hat)
    sd <- sqrt((1 - p1hat) / (p1hat * n1) + (1 - p2hat) / (p2hat * n2))

    lower <- logrr - sd * qnorm(1 - 0.05/2)
    upper <- logrr + sd * qnorm(1 - 0.05/2)

    real <- log(p1/p2)
    return (real > lower && real < upper)
}

#       This function calculate the percentage of coverage 
#       by running the above function repeatedly
RRcoverage <- function(n = 1000, ...) {
    temp <- rep(0, n)

    for (i in 1:n) {
        temp[i] <- RRcover_or_not(...)
    }
    res <- sum(temp) / n
    return(res)
}

#       This function calcuate coverage repeatedly and plot the density
#       It also shades the area which is above 95% so we can check how
#       good this method (Wald) is in computing C.I. of log relative risk
RRcoverageplot <- function(rep = 1000, ...) {
    temp <- rep(0, rep)
    for (i in 1:rep) {
        temp[i] <- RRcoverage(...)
    }
    den <- density(temp)
    plot(den, type = "l", col = "red", lwd = 2, main = "Coverage for Wald 95% CI for RR", xlab = "Coverage")
    abline(v = 0.95, lty = 2)

    index <- abs(den$x - 0.95) == min(abs(den$x - 0.95))
    pin <- which.max(index)

    x1 <- den$x[pin: length(den$x)]
    y1 <- rep(0, length(x1))
    x2 <- rev(x1)
    y2 <- rev(den$y[pin:length(den$y)])
    polygon(c(x1, x2), c(y1, y2), col = "skyblue")

    per_over_95 <- sum(temp > 0.95)/rep* 100
    text(0.955, 20, labels = paste(bquote(.(per_over_95)), "%", sep = ""), cex = 1.5)
}
The following is the result for the case that the even is rare (p1 = 0.0007, p2 = 0.003). The Wald CI turns to be very good. Because most of the time, 95% coverage actually is higher than 95% !

image

New R functions used are

  1. bquote( )
  2. polygon( )