Thinking around milestones

wdoyle42 commented 4 years ago

Here's my thoughts about milestones for the project-- I'm not stuck on these, just wanted to get them out there so we can discuss. The question is: are these the right milestones?

Get dataset up and running, deal with missing data and etc. Generate a decent list of covariates to be used, based on practice/literature.
Create predictive algorithms for student success in 4 year institutions. I've suggested three:
1. Logistic regression.
2. Elastic net (logistic regression with features/covariate importance selected via a combination of two different approaches, ridge and lasso). This one is mostly done, via issue #18
3. Random forest
4. Neural nets (wild overkill, totally useless in these applications, but . . . sounds fancy)
Compare the utility of these different approaches by using standard measures of accuracy generated via cross-validation (sensitivity, specificity, accuracy AUC) and discussing the actual substantive difference these different approaches might generate (e.g. 4 additional students are correctly classified). This what we discussed for the UseR conference.
1. Accuracy of different models
2. Substantive importance of differences in accuracy
Compare the likely outcome of policy decisions made on the basis of these predictions, using simulated data generated to resemble student enrollment at a given institution. This would give us a sense of what decision-making based on these predictions might do. So, for instance, what if an institution used a lower probability of success as a basis for admission (fit) and denied admission? What number of students from different groups would end up enrolling and graduating? This part could use the most help, from my perspective, but we've also got a ways to go.

btskinner commented 4 years ago

@wdoyle42: This tracks with what I understood we were doing. Thanks for laying it out more specifically. I can imagine that as we move forward, we may modify pieces here and there, but the milestones seem right to me. A question and a comment:

Question

Are you thinking that we work through the steps in order such that

we don't move to (3) until we completely finish with (2); or
we can move to (2) as long as (1) is "good enough" or move to (3) with elastic net, even though we may not have random forests ready yet.

I can see the benefits to either approach, but will throw my hat into the ring for the second approach.

Comment

For (4) we can set up an Rmd template and batch create reports for many institutions like we did for states in the College Affordability Diagnosis. We could also set up a website / shiny application. Maybe we've discussed this already (?) so I'll just reiterate that I like the idea of a common template report --- pdf or webpage --- that we batch run for a number of institutions.

wdoyle42 commented 4 years ago

@btskinner I agree-- let's move forward with what we have, even if we're not completely done with previous steps.

I like the idea of a batch Rmd template.

wdoyle42 commented 4 years ago

From: Matthew Schuelke matthew.schuelke@slu.edu Sent: Saturday, April 18, 2020 7:42 AM To: Doyle, Will w.doyle@Vanderbilt.Edu Subject: useR! 2020 Message to Accepted Presenters

Dear William, First and foremost, we hope this email finds you and your family healthy and well.

We wanted to reach out quickly to let you know that we have made the difficult decision to cancel the in-person element of useR! 2020. We really appreciate your willingness to be a part of this impressive group of presenters and we are extremely disappointed that we are not able to welcome you to St. Louis. We were (and still are) particularly proud of our slate of presentations (including you!) and we know from our interactions with the R community that they were very enthusiastic about learning from you as well.

This has been in process for several weeks, but we have not been able to share details publicly until today. We appreciate your patience with us as we have navigated our relationships with our venue, stakeholders, and partners.

We have not made any final decisions about what comes next in terms of useR! 2020. We’ll be working with the R Foundation and our organizing team next week to determine the best course of action for us to collectively take. We will be in touch soon about possible next steps.

Please stay safe during this evolving pandemic. Best, Chris Prener, Ph.D. and Jenine Harris, Ph.D. useR! 2020 St. Louis Co-lead Organizers

I'd still like to use the same deadline/product that we had before--- an html based product that shows the predictive accuracy of different models in a substantive sense, to be ready by July 1. Is that still ok with everyone?

eddatasci / unrollment_proj