CodeSpaceHQ / MENGEL

A framework that applies machine learning algorithms and automates the process of finding the right algorithm for the job.
6 stars 1 forks source link

Using Machine Learning For Data Filling #139

Closed isaac-gs closed 7 years ago

isaac-gs commented 7 years ago

I'm building a class that will use machine learning (I'm looking at a variety of algorithms), to go through and fill in missing data. This is the approximate process,

Steps:

  1. Create a new variable that is the merged version of the training and testing data
  2. Separate all of the rows with missing data in them and place them in a list.
  3. In the missing data list, a complete column is a predictor, an incomplete column is a target. Select a specific target for stages 4/5 and remove other target columns.
  4. Train an algorithm using the complete data using the set of predictor columns in the missing data (train), and apply it to the incomplete data (test).
  5. Repeat starting at step 3 if columns with missing data still exist.

I'm still tinkering with this and trying a few ways of doing it. Testing and documenting is going to take a while.

asclines commented 7 years ago

Where will this fit in the pipeline?

isaac-gs commented 7 years ago

It'll be part of the filler strategy and probably more useful in situations where the ratio of missing data is low