SmithCollege-SDS / tidy-islr

tidyverse versions of ISLR labs
23 stars 24 forks source link

Bootstrap for non-modeling tasks #9

Open AmeliaMN opened 7 years ago

AmeliaMN commented 7 years ago

I just did a pass through lab 7, and referenced @ijlyttle's bootstrap example page to tidy the bootstrap model areas.

However, the text of the book introduces the bootstrap as something that can be used to estimate any parameter. It is calculating alpha=var(Y)-cov(X,Y))/(var(X)+var(Y)-2*cov(X,Y) for the example. I tried to use modelr::bootstrap() to do this example and keep getting into impenetrable errors (probably because I am not used to debugging with tibbles). I've tried a variety of ways of defining the function, from

fn_alpha = function(data){
  Y = data$Y
  X = data$X
  alpha = (var(Y)-cov(X,Y))/(var(X)+var(Y)-2*cov(X,Y))
  return(list(alpha=alpha))
}

to

fn_alpha = function(data){
  data %>%
  summarize(alpha = (var(Y)-cov(X,Y))/(var(X)+var(Y)-2*cov(X,Y)))
}

and many things in between. These work on a single case, like

Portfolio %>%
  slice(1:100) %>%
  fn_alpha()

but when I try to extend using bootstrap() they break

Portfolio %>%
  modelr::bootstrap(1000) %>%
  mutate(alpha = map(strap, fn_alpha)) 

Some error messages: Error: 'x' is NULL and Error: no applicable method for 'summarise_' applied to an object of class "resample" (for the two function definitions, respectively).

Can anyone help? Look here for this example in context.

AmeliaMN commented 7 years ago

(I'd take fixing PRs or suggestions on tibble debugging)

jtr13 commented 7 years ago

The issue has to do with the multinested lists that modelr::bootstrap() returns. (Why this does not create problems for lm() is beyond me.) Most likely there's a better way, but I believe this works:

fn_alpha = function(strap){
    data <- strap$data
    idx <- strap$idx
    X <- data$X[idx]
    Y <- data$Y[idx]
    (var(Y) - cov(X,Y)) / (var(X) + var(Y) - 2 * cov(X,Y))
}