avehtari / ROS-Examples

Regression and other stories R examples
https://avehtari.github.io/ROS-Examples/
322 stars 257 forks source link

Figure 13.10 & 14.3 (a) potentially misleading; given values of arsenic used #69

Open shane-kercheval opened 3 years ago

shane-kercheval commented 3 years ago

Figures 13.10 and 14.3 (a) show the probability of switching wells as a function of distance and a given level of arsenic.

The values of arsenic used are 0.5 and 1.0.

The minimum value of arsenic in the data is 0.51, so perhaps 0.5 is close enough.

But the value of 1 is something like the 35% percentile in the data. I'm not sure this is a good representation, and seems arbitrary, but perhaps I'm missing something.

The median value of arsenic is 1.3.

Plotting the min/median/max values of arsenic (or even something like ~95% percent, which is ~3.79), I think provide better insight.

The values 0.5-1 of arsenic used in the graph is on the lower levels and so to say, in the book, "the interaction is small in the range of most of the data" seem misleading because we're only looking at lower levels of arsenic.

(graph b uses a distance of 0 to 50, which seems more reasonable because its covering the "closest" 65% of data (median dist is 36.8), and 50 meters is a nice conceptual value; although seems like a min/medium/max approach could benefit this graph as well)

Here's a screenshot at different levels.

Screen Shot 2020-12-23 at 12 24 38 PM

code below, along with the non-interaction model

library(rstan)
library(rstanarm)

set.seed(2)
model_interaction <- stan_glm(switch ~ dist100*arsenic, family=binomial(link="logit"), data=wells, refresh=0)
model_no_interaction <- stan_glm(switch ~ dist100+arsenic, family=binomial(link="logit"), data=wells, refresh=0)

jitter_binary <- function(a, jitt=.05){
  a + (1-2*a)*runif(length(a),0,jitt)
}
predict_switch <- function(.model, .dist100, .arsenic) {
    predict(.model,
            newdata = data.frame(dist100=.dist100, arsenic=.arsenic),
            type='response')
}

plot_prob_swiching_given_distance <- function(.model) {

    plot(wells$dist, jitter_binary(wells$switch), xlim=c(0, max(wells$dist)))
    for(arsenic_quantile in c(0, 0.5, 0.95, 1)) {
        arsenic_value <- quantile(wells$arsenic, arsenic_quantile)
        curve(predict_switch(.model, .dist100=x/100, .arsenic = arsenic_value), add=TRUE, col='red')

        text_y <- predict_switch(.model, .dist100=0, .arsenic = arsenic_value)
        text(.25, text_y, paste("if arsenic = ", arsenic_value), adj=0, cex=.8, col = 'red')
    }

}
plot_prob_swiching_given_distance(model_no_interaction)
plot_prob_swiching_given_distance(model_interaction)
shane-kercheval commented 3 years ago

Also, just wanted to say, this is a fantastic book; I'm getting a lot of practical insights out of it. Thank you.

andrewgelman commented 3 years ago

Hi, thanks for the kind words. Regarding the arsenic level: yes, I see your point. Something like the median and the 90th percentile, or the 25th and 75th percentile, would make more sense. I don't know if I have the energy to re-make all these graphs, but it would make sense to add a couple of sentences to make this point.