Open mingweiiiiiiiiii opened 2 years ago
@1978abhay Thanks for your reply
For KNN imputation : i used cross-validation to find the optimal K
The X-axis is the value of K,the y-axis is the RMSE value of K .
@1978abhay
@8n76nn98 I think I have explained the whole process a few times now. I'll wait and see your first model and its results on the test data..
Thanks I found one sequence is wrong and I need to fix it Thanks for your reply in the Sunday
Dear Abhay: Thanks for your yesterday's guidance. I write down the notes of what you said. But I am afraid that I do the wrong way again and that is why I send you my notes before doing Thanks
Removal of missing too much data (95%) and univariate feature(only have one value) and high cardinality category feature(it is difficult to one-hot encoding )
-> imputation of data(mode/median /or knn imputaoner)
-> Data transform(log /square root,exp) to make its having linearity relationship
->Feature selection for all features(random forst)->
model construction->
High collinearity features removal ->
Variance inflation factor /AVONA to remove highly multicollinearity features : maintaining the VIF<5 and <5VIF<10.
-> R square and mean square error to evaluate the generalization of the model Thank for your rely @1978abhay