102207429 / Expedia

數據科學與大數據分析
0 stars 0 forks source link

Standardization of features' values #4

Open GregoryWu opened 7 years ago

GregoryWu commented 7 years ago

Do you think I need to standardize all the values of all the features? That is, make them become normal distributed (Gaussian with zero mean and unit variance.)

plz refer: http://scikit-learn.org/stable/modules/preprocessing.html#standardization-or-mean-removal-and-variance-scaling

Best, Greg

102207429 commented 7 years ago

I think you can try to normalize the features which units are not appropriate for y variable. Such as y uses meters but x1 uses kilometers. Otherwise, normalization may not work well.

GregoryWu commented 7 years ago

I see. So if I wanna remove all the X variables with variance lower than 2, do I need to normalize them first?

102207429 commented 7 years ago

The variance have little relationship with variables' units, so I think you don't have to normalize them.