MichiganDataScienceTeam / googleanalytics

MDST Project Fall 2018
7 stars 7 forks source link

Preprocess: u'geoNetwork.networkDomain', u'geoNetwork.networkLocation', u'geoNetwork.region', u'geoNetwork.subContinent', #75

Open wesleytian opened 5 years ago

wesleytian commented 5 years ago

Preprocess the following features:

u'geoNetwork.networkDomain', u'geoNetwork.networkLocation', u'geoNetwork.region', u'geoNetwork.subContinent',

  1. Standardization: http://scikit-learn.org/stable/modules/preprocessing.html#standardization-or-mean-removal-and-variance-scaling

  2. Impute missing values: http://scikit-learn.org/stable/modules/impute.html

  3. Normalization: http://scikit-learn.org/stable/modules/preprocessing.html#normalization

  4. Encode categorical features (optional): http://scikit-learn.org/stable/modules/preprocessing.html#encoding-categorical-features

  5. Discretization (optional): http://scikit-learn.org/stable/modules/preprocessing.html#discretization

http://scikit-learn.org/stable/modules/preprocessing.html

mengqiuteng commented 5 years ago

If for a certain fullVisitorID, multiple records with different geoNetwork value is found, how should we deal with this? Do we just take the majority value?