VectorPosse / intro_stats

Introduction to Statistics: an integrated textbook and workbook using R
MIT License
17 stars 7 forks source link

Module 9 -- "incorrect" p-value from 1-ulcer_prop #23

Closed rhinopotamus closed 5 years ago

rhinopotamus commented 5 years ago

Okay, so here's a funny thing:

In R module 9, let's say that students code their ulcer variable "backwards":

ulcer <- factor(Melanoma$ulcer, levels = c(0, 1), labels = c("absent", "present"))

Well, then if they do ulcer_prop <- prop(ulcer, data = ulcer_df), then they get 0.561ish, and so they'll want to use 1-ulcer_prop for most of their calculations. However, this gives you the wrong two-sided p-value:

P2 <- 2 * prop(sims2$prop <= 1-ulcer_prop) results in 0.07.

The reason for this appears to be a rounding issue. Try doing sims2 %>% filter(prop <= 1-ulcer_prop) %>% arrange(prop) versus sims2 %>% filter(prop <= 90/205) %>% arrange(prop) and you'll see what I mean -- the prior results in 35 rows where the latter results in 50 rows, including some with a proportion equivalent to 90/205.

I think the easiest fix is to just mention 0.07 as another possibility for the two-sided p-value in the rubric, or else to very specifically instruct students to put the "success" criterion first.

VectorPosse commented 5 years ago

Good catch. In fact I'll do both. I'll change the module to instruct students specifically to use levels = c(1, 0) and labels = c("present", "absent"), and I'll also change the rubric to mention the possible discrepancy.