Definition of linear classifier and linear decision boundary

carpentries-incubator / ml4bio-workshop

Materials for a workshop introducing machine learning to biologists

https://carpentries-incubator.github.io/ml4bio-workshop/

Other

21 stars 9 forks source link

Definition of linear classifier and linear decision boundary #82

Open agitter opened 4 years ago

agitter commented 4 years ago

We can consider how to add more definitions or an interactive exercise around what it means for a decision boundary to be linear. This seemed to be one part of the lesson where participants were not completely following along. We could prepare an example or question to help contrast linear versus non-linear decision rules, perhaps going back to the housing example.

Some of our linear examples also seemed to be complex. For this audience, I think of linearly separable to mean that the 2D data plot can be divided into red and blue points with a straight line. Some of our examples used a linear classifier like SVM with a kernel, such as the RBF kernel. Explaining that is pretty advanced for the short form of this workshop.

agitter commented 4 years ago

We also need to define the word classifier the first time it is used. The term is used in the introduction lesson but introduced in the T cell lesson.

Other terms such as bagging and boosting were also used in the random forest discussion but not defined. We should add these, and possibly the regression ensemble graphic, to the lesson.

cmilica commented 4 years ago

Introduction

added classification definition to the Introduction lesson
should we add a simple regression definition? I struggled with this.
In the workshop, the question came up about explaining further the difference between supervised and unsupervised learning - should we maybe add a callout about it?

See 7f16a9a4426e7c7a0c86271e680fe5bcf67d0bdd

agitter commented 4 years ago

Let's avoid the word "classifying" in the definition. How about

predicting a category for the samples in the data set. In the house price example, the categories are high and low.

Regression could be

predicting a continuous number for the samples in the data set. A regression version of the house price example would be predicting the price in dollars.

For unsupervised learning, we discussed not wanting to spend a lot of time on this. We can use the housing dataset we create to show how clustering would work without requiring the labels. We can then link to extra resources, like our glossary. Google's clustering intro is okay but not perfect.

cmilica commented 4 years ago

All of these look good to me! I will edit the lesson following the suggestions. Thanks! I will edit the glossary.