Closed benjaminEwhite closed 6 years ago
I added the blow part as an explanation to the get_dummies.
"Notice that the categorical variables are strings and we need to convert them to numerical values. This can be viewed as part of a feature enginnering process. One of the most convenient ways of converting categorical variables into numerical ones is called one hot encoding. In one hot encoding, we create a sperate binary variable which takes 0 or 1 for all of the unique values of the categorical variable. Pandas' get_dummies() function does this job for us.
Below, we call the get_dummies() function for the sex and smoker categorical variables in our dataset. Since both sex and smoker variables include two values, the get_dummies() function will create two dummy (indicator) variables for us. Since one of them is enough for us to indicate whether the person is male or not and is a smoker or not, we keep only one of the newly created dummies bot for sex and smoker in our data frame. We do this by feeding the parameter drop_first which is set to True into the get_dummies() function."
It should come in earlier material, but not sure it does.
https://github.com/Thinkful-Ed/machine-learning-regression-problems/blob/master/notebooks/2.simple_linear_regression_models.ipynb