Open agitter opened 4 years ago
We also need to define the word classifier the first time it is used. The term is used in the introduction lesson but introduced in the T cell lesson.
Other terms such as bagging and boosting were also used in the random forest discussion but not defined. We should add these, and possibly the regression ensemble graphic, to the lesson.
Introduction
See 7f16a9a4426e7c7a0c86271e680fe5bcf67d0bdd
Let's avoid the word "classifying" in the definition. How about
predicting a category for the samples in the data set. In the house price example, the categories are high and low.
Regression could be
predicting a continuous number for the samples in the data set. A regression version of the house price example would be predicting the price in dollars.
For unsupervised learning, we discussed not wanting to spend a lot of time on this. We can use the housing dataset we create to show how clustering would work without requiring the labels. We can then link to extra resources, like our glossary. Google's clustering intro is okay but not perfect.
All of these look good to me! I will edit the lesson following the suggestions. Thanks! I will edit the glossary.
We can consider how to add more definitions or an interactive exercise around what it means for a decision boundary to be linear. This seemed to be one part of the lesson where participants were not completely following along. We could prepare an example or question to help contrast linear versus non-linear decision rules, perhaps going back to the housing example.
Some of our linear examples also seemed to be complex. For this audience, I think of linearly separable to mean that the 2D data plot can be divided into red and blue points with a straight line. Some of our examples used a linear classifier like SVM with a kernel, such as the RBF kernel. Explaining that is pretty advanced for the short form of this workshop.