Open vijayphugat opened 6 years ago
@levithatcher @taylorlarsen and @mmastand do you have any input on this? Otherwise we will submit a PR for its fix soon.
We have had good luck using the following in our other work:
-999
and then use a tree-based method.missing
What do you think about these methods? We'd be grateful if you wanted to do a PR!
I want to do PR on this finding and currently I am using below techniques:
Both of these methods work good on linear as well as non-linear type of data.
Current package impute missing values using mean and median.
Now I have identified an approach to apply Machine Learning models for imputing the missing values:
Existing approach: Impute missing values using Mean/Median Drawback:
New Approach: Impute missing values as per below:
Advantages:
So for the datasets having large number of missing values, this approach can improve the overall quality of data to be feeded to ML algorithms. Thus perfomance of existing model can be improved using this imputation stratgey.