The performance of Wald confidence interval for relative risk
The relative risk is important in statistics in comparing the probabilities of binary events. It is also easier to interpret statistical knowledge to general public. For example, people will easily understand
The probability of getting lung cancer is ten times higher for people who smoke at least 1 cigarettes every day than those who don't smoke at all.
But it is confused if people are told
The odds ratio of people who smoke at least 1 cigarettes every day and those who don't smoke at all is 15.
When calculating the confidence intervals of relative risk, it turns out the log relative risk has better asymptotic properties than relative risk itself. The following R code is for testing the performance of Wald confidence interval for log relative risk.
## This is to calculate the coverage of Wald C.I. of relative risk of 2 by 2 table
## --------------------------------------------------------------
## Lung Cancer No Lung Cancer
## Smoke n11 n12
## Non-Smoke n21 n22
## --------------------------------------------------------------
# This function just return TRUE or FALSE to indicate whether the C.I.
# will cover the real relative risk or not
RRcover_or_not <- function(p1 = 0.007, p2 = 0.003, n1=5000, n2 = 10000) {
n11 <- rbinom(1, size = n1, prob = p1)
n21 <- rbinom(1, size = n2, prob = p2)
n12 <- n1 - n11
n22 <- n2 - n21
p1hat <- n11/n1
p2hat <- n21/n2
logrr <- log(p1hat / p2hat)
sd <- sqrt((1 - p1hat) / (p1hat * n1) + (1 - p2hat) / (p2hat * n2))
lower <- logrr - sd * qnorm(1 - 0.05/2)
upper <- logrr + sd * qnorm(1 - 0.05/2)
real <- log(p1/p2)
return (real > lower && real < upper)
}
# This function calculate the percentage of coverage
# by running the above function repeatedly
RRcoverage <- function(n = 1000, ...) {
temp <- rep(0, n)
for (i in 1:n) {
temp[i] <- RRcover_or_not(...)
}
res <- sum(temp) / n
return(res)
}
# This function calcuate coverage repeatedly and plot the density
# It also shades the area which is above 95% so we can check how
# good this method (Wald) is in computing C.I. of log relative risk
RRcoverageplot <- function(rep = 1000, ...) {
temp <- rep(0, rep)
for (i in 1:rep) {
temp[i] <- RRcoverage(...)
}
den <- density(temp)
plot(den, type = "l", col = "red", lwd = 2, main = "Coverage for Wald 95% CI for RR", xlab = "Coverage")
abline(v = 0.95, lty = 2)
index <- abs(den$x - 0.95) == min(abs(den$x - 0.95))
pin <- which.max(index)
x1 <- den$x[pin: length(den$x)]
y1 <- rep(0, length(x1))
x2 <- rev(x1)
y2 <- rev(den$y[pin:length(den$y)])
polygon(c(x1, x2), c(y1, y2), col = "skyblue")
per_over_95 <- sum(temp > 0.95)/rep* 100
text(0.955, 20, labels = paste(bquote(.(per_over_95)), "%", sep = ""), cex = 1.5)
}
The following is the result for the case that the even is rare (p1 = 0.0007, p2 = 0.003). The Wald CI turns to be very good. Because most of the time, 95% coverage actually is higher than 95% !
The performance of Wald confidence interval for relative risk
The relative risk is important in statistics in comparing the probabilities of binary events. It is also easier to interpret statistical knowledge to general public. For example, people will easily understand
The probability of getting lung cancer is ten times higher for people who smoke at least 1 cigarettes every day than those who don't smoke at all.
But it is confused if people are told
The odds ratio of people who smoke at least 1 cigarettes every day and those who don't smoke at all is 15.
When calculating the confidence intervals of relative risk, it turns out the log relative risk has better asymptotic properties than relative risk itself. The following R code is for testing the performance of Wald confidence interval for log relative risk.
The following is the result for the case that the even is rare (p1 = 0.0007, p2 = 0.003). The Wald CI turns to be very good. Because most of the time, 95% coverage actually is higher than 95% !
New R functions used are