D-Lab's 6 hour introduction to machine learning in Python. Learn how to perform classification, regression, clustering, and do model selection using scikit-learn in Python.
Imputation for Categorical Variables: np.unique() in the imputation section could be confusing compared to the previous output where the NaNs are in a dataframe. Maybe consider converting cp_imp back to a pandas dataframe to show the difference between the two after imputation.
Dummy Encoding: I believe dummy encoding can be done by passing in "drop='first'" as an argument in sklearn.OneHotEncoder object. This should remove the need to create a separate DummyEncoding class.
ColumnTransformer: Spelling mistakes in "ColumntTransformer for Combined Preprocessing" opening description -> "ColumntTransformer" should be "ColumnTransformer", "differntially" should be "differentially"
Transform the test Data: Spelling mistake after data is saved -> "...everything else is just a matter of choosing your mdoel..." should be "model"
GLM Ridge Regression: Spelling mistake in opening description -> "Ridge regression takes a hyerparameter..." should be "hyperparameter"
GLM Ridge Regression: "Leave One Out Cross Validation" (LOOCV) is not explained. A "see more" link might be useful.
Non-Linear Models: Might be helpful to include a quick explainer comparing linear vs non-linear models and pros/cons. Currently they are introduced without explanation.
cp_imp
back to a pandas dataframe to show the difference between the two after imputation.test
Data: Spelling mistake after data is saved -> "...everything else is just a matter of choosing your mdoel..." should be "model"