Open wdoyle42 opened 4 years ago
Ok, did some work on this one, it's in branch issue_15, dataset is all base year, f1 and f2 information, dropping weights and flags, adding in outcome (ba completion by 3rd followup) with factors and age formatted appropriately. I think this could be ready for a model that uses regularization.
Thanks, @wdoyle42. Are you ready for someone else to take a look or are you still working on it?
@btskinner still working. I'm looking for a way to programmatically drop both non-informative and perfectly collinear variables before handing it off. I'll take one more pass and submit for review.
@btskinner can you take a look at latest commit? I want to drop highly correlated variables, but want to keep frequently used composite variables as opposed to others-- e.g. bypared over its source variables. I'm going on circles on this. Any ideas?
@wdoyle42, I'll take a look and get back.
R function that will read in the data and do basic wrangling.
Inputs: list of names of dependent variable and independent variables from ELS, name of local data file Outputs:
.rds
data file, with all lower case names, missing data handled, filtered for four year enrollees only.Much of this code is now in predict_grad.Rmd, lines 83-130