Amherst-Statistics / IS5inR

Companion materials for De Veaux, Velleman, and Bock's "Intro Stats" 5th edition
MIT License
0 stars 3 forks source link

issue in chapter 19 #36

Closed nicholasjhorton closed 6 years ago

nicholasjhorton commented 6 years ago

@mchien20 wrote: I can't figure out how to use the chisq.test() function in Chapter 19 to find the statistic for a data set.

I need more information please.

mchien20 commented 6 years ago

This is how I'm finding the chisq value for BaseballBirths:

BaseballBirths <- read_csv("http://nhorton.people.amherst.edu/is5/data/Ballplayer_births.csv") %>% clean_names()
natbirth <- c(.08, .07, .08, .08, .08, .08, .09, .09, .09, .09, .08, .09)
BaseballBirths <- cbind(BaseballBirths, natbirth) # adding a column for national birth % totaln <- sum(~ ballplayer_count , data = BaseballBirths)
BaseballBirths <- BaseballBirths %>% mutate(expected = totaln * natbirth)
df <- nrow(BaseballBirths) - 1
BaseballBirths %>% mutate(chisq = ((ballplayer_count - expected)^2)/expected) %>% summarise(sum(chisq))
xpchisq(q = 26.5, df = df, lower.tail = FALSE)

I want to use the chisq.test() function, but I can't figure out how to get the right value. I've tried:

with(chisq.test(tally(~ ballplayer_count | expected, data = BaseballBirths)), statistic) with(chisq.test(tally(natbirth ~ ballplayer_count, data = BaseballBirths)), statistic) but they're both incorrect.

mchien20 commented 6 years ago

In a similar vein of editting out group_by() and summarise(), I'm not sure how to replace group_by(). For example, Cars <- read_csv("http://nhorton.people.amherst.edu/is5/data/Car_speeds.csv") numsim <- 10000 CarSims <- do(numsim) * Cars %>% mutate(direction = shuffle(direction)) %>% group_by(direction) %>% summarise(means = mean(~ speed)) %>% summarise(diffmeans = diff(means))

I'm not sure how to convert this.

nicholasjhorton commented 6 years ago

These probably should be two separate issues (since the Cars is in Chapter17). I fixed the chisq one with https://github.com/Amherst-Statistics/IS5inR/commit/345454f5579b120203186684b28af744cb7f9d2d

nicholasjhorton commented 6 years ago

Can you please be sure to extirpate all calls of the form "favstats(~ Y | X)"? Instead these should always be "favstats(Y ~ X)" since that's the simplest form of the modeling language that is introduced (the | is used to bring in a third variable). I fixed one in Chapter17.Rmd but suspect that there might be more.

nicholasjhorton commented 6 years ago

Here's a fix for the diffmean() issue: https://github.com/Amherst-Statistics/IS5inR/commit/c8d308dd624c521545f71e02a91215f19e3640a0

Can you please fix the remaining tidyverse issues (marked XX MC)?

mchien20 commented 6 years ago

diffmean addressed: https://github.com/Amherst-Statistics/IS5inR/commit/24697fb114a213dadd16e2f1da15073e3f86e9db | addressed: https://github.com/Amherst-Statistics/IS5inR/commit/eb5c1f9e954ddb0c58ae8882f8c047c8c3b55649