I'm building a class that will use machine learning (I'm looking at a variety of algorithms), to go through and fill in missing data. This is the approximate process,
Steps:
Create a new variable that is the merged version of the training and testing data
Separate all of the rows with missing data in them and place them in a list.
In the missing data list, a complete column is a predictor, an incomplete column is a target. Select a specific target for stages 4/5 and remove other target columns.
Train an algorithm using the complete data using the set of predictor columns in the missing data (train), and apply it to the incomplete data (test).
Repeat starting at step 3 if columns with missing data still exist.
I'm still tinkering with this and trying a few ways of doing it. Testing and documenting is going to take a while.
I'm building a class that will use machine learning (I'm looking at a variety of algorithms), to go through and fill in missing data. This is the approximate process,
Steps:
I'm still tinkering with this and trying a few ways of doing it. Testing and documenting is going to take a while.