Confirm my of my writing notes:

mingweiiiiiiiiii commented 2 years ago

Dear Abhay: Thanks for your yesterday's guidance. I write down the notes of what you said. But I am afraid that I do the wrong way again and that is why I send you my notes before doing Thanks

Removal of missing too much data (95%) and univariate feature(only have one value) and high cardinality category feature(it is difficult to one-hot encoding )

-> imputation of data(mode/median /or knn imputaoner)

-> Data transform(log /square root,exp) to make its having linearity relationship

->Feature selection for all features(random forst)->

model construction->

High collinearity features removal ->

Variance inflation factor /AVONA to remove highly multicollinearity features : maintaining the VIF<5 and <5VIF<10.

-> R square and mean square error to evaluate the generalization of the model Thank for your rely @1978abhay

mingweiiiiiiiiii commented 2 years ago

@1978abhay Thanks for your reply

mingweiiiiiiiiii commented 2 years ago

For KNN imputation : i used cross-validation to find the optimal K

mingweiiiiiiiiii commented 2 years ago

The X-axis is the value of K,the y-axis is the RMSE value of K .

mingweiiiiiiiiii commented 2 years ago

@1978abhay

1978abhay commented 2 years ago

@8n76nn98 I think I have explained the whole process a few times now. I'll wait and see your first model and its results on the test data..

mingweiiiiiiiiii commented 2 years ago

Thanks I found one sequence is wrong and I need to fix it Thanks for your reply in the Sunday

MoreeZ / sweng-2022

Confirm my of my writing notes: #27