Closed rhinopotamus closed 5 years ago
Good catch. In fact I'll do both. I'll change the module to instruct students specifically to use levels = c(1, 0)
and labels = c("present", "absent")
, and I'll also change the rubric to mention the possible discrepancy.
Okay, so here's a funny thing:
In R module 9, let's say that students code their ulcer variable "backwards":
ulcer <- factor(Melanoma$ulcer, levels = c(0, 1), labels = c("absent", "present"))
Well, then if they do
ulcer_prop <- prop(ulcer, data = ulcer_df)
, then they get 0.561ish, and so they'll want to use 1-ulcer_prop for most of their calculations. However, this gives you the wrong two-sided p-value:P2 <- 2 * prop(sims2$prop <= 1-ulcer_prop)
results in 0.07.The reason for this appears to be a rounding issue. Try doing
sims2 %>% filter(prop <= 1-ulcer_prop) %>% arrange(prop)
versussims2 %>% filter(prop <= 90/205) %>% arrange(prop)
and you'll see what I mean -- the prior results in 35 rows where the latter results in 50 rows, including some with a proportion equivalent to 90/205.I think the easiest fix is to just mention 0.07 as another possibility for the two-sided p-value in the rubric, or else to very specifically instruct students to put the "success" criterion first.