MIT-LCP / gossis

Extracting consistent concepts from multiple databases
10 stars 6 forks source link

Decide on approach to missing data #16

Closed alistairewj closed 6 years ago

alistairewj commented 6 years ago

We need to decide how best to handle missing data.

I've created a spreadsheet which lists all variables currently available in GOSSIS: https://docs.google.com/spreadsheets/d/13PUD9adciV8mCZ25A6MxrUtd9WXd28-ZCyBfpk0DMDc/edit?usp=sharing

Next to each variable is a drop down list with possible options on how to impute data. We can collaboratively contribute/comment/discuss in the spreadsheet.

jraffa commented 6 years ago

We use xgboost to predict each variable separately, given all other covariates. It seems to work well for most variables.