Amherst-Statistics / IS5inR

Companion materials for De Veaux, Velleman, and Bock's "Intro Stats" 5th edition
MIT License
0 stars 3 forks source link

Need full baby dataset to replicate fig14.10 and 14.11 (or do it for the employee commute times p.455-456) #18

Closed mchien20 closed 6 years ago

mchien20 commented 6 years ago

In Chapter 14, there are two instances in which a 95% confidence interval is found for bootstrap means. Figure 14.11 (page 455): library(mosaic) library(readr) library(janitor) Babies <- read_csv("http://nhorton.people.amherst.edu/is5/data/Babysamp_98.csv") %>% clean_names() set.seed(1243536) numsim <- 10000 bootstrapmeans <- do(numsim) * mean(~ weight, data = mosaic::resample(Babies, 100)) gf_histogram(~ mean, data = bootstrapmeans) %>% gf_labs(x = "Bootstrap Means", y = "")

Figure 14.11

bootstraplm <- lm(mean ~ 1, data = bootstrapmeans) confint(bootstraplm)

Example (page 456): Commute <- read_csv("http://nhorton.people.amherst.edu/is5/data/Population_Commute_Times.csv") %>% clean_names() set.seed(134) numsim <- 10000 commutebootstrap <- do(numsim) * mean(~ commute_time, data = mosaic::resample(Commute, 100)) gf_histogram(~ mean, data = commutebootstrap, title = "Time Bootstrap Set Means") %>% gf_labs(x = "", y = "") cbootlm <- lm(mean ~ 1, data = commutebootstrap) confint(cbootlm)

I've tried both confint() and t.test(), but the intervals are a lot narrower than the ones in the book. I tried multiple seeds, but they were all similar.

nicholasjhorton commented 6 years ago

I don’t have access to the book but wonder why samples of size 100 are taken here. Is this still a hypothetical sampling distribution or is the goal to bootstrap an interval?

All the best,

Nick

On Jul 3, 2018, at 11:30 AM, Margaret Chien notifications@github.com wrote:

In Chapter 14, there are two instances in which a 95% confidence interval is found for bootstrap means. Figure 14.11 (page 455): library(mosaic) library(readr) library(janitor) Babies <- read_csv("http://nhorton.people.amherst.edu/is5/data/Babysamp_98.csv") %>% clean_names() set.seed(1243536) numsim <- 10000 bootstrapmeans <- do(numsim) * mean(~ weight, data = mosaic::resample(Babies, 100)) gf_histogram(~ mean, data = bootstrapmeans) %>% gf_labs(x = "Bootstrap Means", y = "")

Figure 14.11

bootstraplm <- lm(mean ~ 1, data = bootstrapmeans) confint(bootstraplm)

Example (page 456): Commute <- read_csv("http://nhorton.people.amherst.edu/is5/data/Population_Commute_Times.csv") %>% clean_names() set.seed(134) numsim <- 10000 commutebootstrap <- do(numsim) * mean(~ commute_time, data = mosaic::resample(Commute, 100)) gf_histogram(~ mean, data = commutebootstrap, title = "Time Bootstrap Set Means") %>% gf_labs(x = "", y = "") cbootlm <- lm(mean ~ 1, data = commutebootstrap) confint(cbootlm)

I've tried both confint() and t.test(), but the intervals are a lot narrower than the ones in the book. I tried multiple seeds, but they were all similar.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.

mchien20 commented 6 years ago

Both are sampling and finding the mean each time. Then a 95% confidence interval is constructed of those means. The book uses samples of size 100

nicholasjhorton commented 6 years ago

Can the three of you review this section of the book prior to our meeting? Something doesn’t sound right here.

All the best,

Nick

On Jul 3, 2018, at 11:43 AM, Margaret Chien notifications@github.com wrote:

Both are sampling and finding the mean each time. Then a 95% confidence interval is constructed of those means. The book uses samples of size 100

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

amywagaman commented 6 years ago

Nick, this looks to be because the previous discussions involved simulating a sampling distribution based on samples of size 100. So, they continue to just use a random sample of size 100 as their basis for everything they are bootstrapping. They basically pretend they don’t have access to the entire data set.

So, basically, it should be draw a random sample of size 100, then bootstrap a CI for the mean birthweight using just those 100 observations.

Does that help? Amy

From: Nicholas Horton [mailto:notifications@github.com] Sent: Tuesday, July 3, 2018 11:46 AM To: Amherst-Statistics/IS5inR IS5inR@noreply.github.com Cc: Subscribed subscribed@noreply.github.com Subject: Re: [Amherst-Statistics/IS5inR] Confidence Intervals for Bootstrap (#18)

Can the three of you review this section of the book prior to our meeting? Something doesn’t sound right here.

All the best,

Nick

On Jul 3, 2018, at 11:43 AM, Margaret Chien notifications@github.com<mailto:notifications@github.com> wrote:

Both are sampling and finding the mean each time. Then a 95% confidence interval is constructed of those means. The book uses samples of size 100

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/Amherst-Statistics/IS5inR/issues/18#issuecomment-402203148, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AG7k6iQ615HA380IF_COKEjRGQe5fiVYks5uC5GagaJpZM4VBLC0.

mchien20 commented 6 years ago

I've discussed this with both Bonnie and Shukry, but it seems like Bonnie has the same problem of the confidence intervals not matching.

mchien20 commented 6 years ago

Deleted code related to babies example: https://github.com/Amherst-Statistics/IS5inR/commit/7cd527fdb9f5805ea4a8715d6f7e74e8b51b53d1