ageron / handson-ml2

A series of Jupyter notebooks that walk you through the fundamentals of Machine Learning and Deep Learning in Python using Scikit-Learn, Keras and TensorFlow 2.
Apache License 2.0
27.98k stars 12.8k forks source link

[BUG] Chap 3 SVC use OvR by default instead of OvO #482

Open ladylazy9x opened 3 years ago

ladylazy9x commented 3 years ago

OvR has 10 scores image

OvO has 45 scores image

But the book say OvO and show 10 scores.

ageron commented 3 years ago

Hi @ladylazy9x ,

Thanks for your feedback.

SVC always uses OvO to train the model when there are more than 2 classes.

The decision_function_shape hyperparameter does not affect the training strategy. It only changes what the decision_function() will output. If you set it to "ovo", then the decision_function() method will output one score per class pair (so 45 numbers when there are 10 classes). But if you set it to "ovr" (which is the default), then the decision_function() will only output one score per class (so 10 scores when there are 10 classes).

Here's what the documentation of this hyperparameter says:

decision_function_shape{‘ovo’, ‘ovr’}, default=’ovr’

  • Whether to return a one-vs-rest (‘ovr’) decision function of shape (n_samples, n_classes) as all other classifiers, or the original one-vs-one (‘ovo’) decision function of libsvm which has shape (n_samples, n_classes * (n_classes - 1) / 2). However, one-vs-one (‘ovo’) is always used as multi-class strategy. The parameter is ignored for binary classification.

Hope this helps.

ladylazy9x commented 3 years ago

Hi @ladylazy9x ,

Thanks for your feedback.

SVC always uses OvO to train the model when there are more than 2 classes.

The decision_function_shape hyperparameter does not affect the training strategy. It only changes what the decision_function() will output. If you set it to "ovo", then the decision_function() method will output one score per class pair (so 45 numbers when there are 10 classes). But if you set it to "ovr" (which is the default), then the decision_function() will only output one score per class (so 10 scores when there are 10 classes).

Here's what the documentation of this hyperparameter says:

decision_function_shape{‘ovo’, ‘ovr’}, default=’ovr’

  • Whether to return a one-vs-rest (‘ovr’) decision function of shape (n_samples, n_classes) as all other classifiers, or the original one-vs-one (‘ovo’) decision function of libsvm which has shape (n_samples, n_classes * (n_classes - 1) / 2). However, one-vs-one (‘ovo’) is always used as multi-class strategy. The parameter is ignored for binary classification.

Hope this helps.

@ageron so ovo which has 45 classifers can output 10 score like ovr? Do you know how they do that?

ageron commented 3 years ago

Yes, exactly.

The source code is available here.

In short, when you call predict() to classify an instance, Scikit-Learn calls libsvm, which makes 45 predictions (one per pair of classes, assuming there are 10 classes), and for each class k it counts the number of duels won (i.e., how many of the 10 classifiers that involve class k predicted class k), and subtracts the number of duels lost (i.e., how many of the 10 classifiers involving class k voted against class k). Let's call nk the score that class k obtains (= #won - #lost). Whichever class gets the highest score nk is the predicted class. In case there are ties, the first class wins.

The decision_function() is different. First, it asks libsvm for more details. It obtains all the individual classifier predictions (positive or negative) as well as the confidence score for each of them. So Scikit-Learn is able to compute nk as well. But it also computes another score for each class, based on the classifier confidence scores: for each class k, it just sums the confidence scores the class got in each duel, whether it won or lost. Let's call this score ck. The decision_function() method returns nk + f(ck) for each class. The function f is f(x) = x / (3 * (|x| + 1)). This functions outputs a number between -1/3 and +1/3. So you can think of the confidence scores as "tie breakers": if two classes won the same number of duels, then the decision_function() will give them identical nk scores, so the ck part will decide which class gets the highest score, and that's based on the confidence scores.

Of course, if you set decision_function_shape="ovo", then the decision_function() method simply returns the 45 confidence scores directly.

Lastly, here's a hyperparameter called break_ties in the SVC class, which defaults to False. If you set it to True (and if there are 3 classes or more, and if decision_function_shape="ovr"), then the predict() method will use the decision_function() method to make its predictions. This ensures that the confidence scores break any ties between classes. There is a performance penalty, however, which is why it defaults to False. The break_ties logic is here in the source code.

Hope this helps!