eddatasci / unrollment_proj

The Unrollment Project: Exploring algorithmic bias in predicting bachelor's degree completion.
5 stars 0 forks source link

Workflows library #23

Open wdoyle42 opened 4 years ago

wdoyle42 commented 4 years ago

Workflows makes a huge difference:

## Set recipe
grad_rec<-recipe(formula=ba_complete_formula,data=df_a)%>%
        ## NB: this is dicey and should be improved, but it's a
        ## start; using K nearest neighbors to impute.
        step_knnimpute(all_predictors()) %>%
        ## convert factors to dummy
        step_dummy(dplyr::one_of(likely_factors)) %>%
        ## center predictors
        step_center(all_predictors())  %>%
        ## rescale all predictors
        step_scale(all_predictors())

## Set Model
grad_mod<-
  logistic_reg()%>%
  set_engine("glm")

##Set Workflow 
grad_wfl<-
  workflow()%>%
  add_recipe(grad_rec)%>%
  add_model(grad_mod)

Then running the model across a resampled dataset looks like this:

  grade_res<-fit_resamples(grad_wfl,
                resamples=validation_data,
                control=control_resamples(save_pred=TRUE))

Way better than the first way I did it. I think this obviates the need for anything but various models