HealthCatalyst / healthcareai-r

R tools for healthcare machine learning
https://docs.healthcare.ai
Other
245 stars 106 forks source link

predict_counterfactual function #881

Closed michaellevy closed 6 years ago

michaellevy commented 6 years ago

Need a better name. Let the user specify any number of columns and generate predictions from the best model across values for those columns (all levels of factors, maybe 5th to 95th percentile of numerics) and at mean/mode for other columns.

Could have a plot method for the output that puts the changing variables on x, color, facet depending on type.

michaellevy commented 6 years ago

I could imagine an awesome shiny app sitting on top of this that lets the user select variables to split predictions across.

michaellevy commented 6 years ago

From @taylorlarsen

I’d be interested to chat through which “patient” or “observation“ we run the counterfactuals for. The average patient, the median patient, a real individual that we perturb values for? Keeping in mind that it would be nice for both model explanation and individual patient/provider conversations (maybe we only allow certain scenarios and have limitations on individual patients).

That's a great point about counterfactual predictions being useful for both model-level interpretation and "what if this patient were ten pounds lighter?" type questions (which is similar to pip, but a slightly different angle on the same thing).

I propose that the default selects a handful of most-important variables and makes predictions across those holding others at their medians. Three levels of customization there could be available to the user: 1. Choosing the number of variables, 2. Choosing which variables, or 3. Choosing which values of those variables use to use. That's all model-level. For user-level, the same three levels of customization could be available, but instead of using the medians of other values, the user could provide an identifier value, and we'd use that observation's values for all the not-changing variables.

michaellevy commented 6 years ago

choose_variables will select the names of most-important variables to use, if needed

To do: