[BUG] Chap 3 SVC use OvR by default instead of OvO

ladylazy9x commented 3 years ago

OvR has 10 scores

OvO has 45 scores

But the book say OvO and show 10 scores.

ageron commented 3 years ago

Hi @ladylazy9x ,

Thanks for your feedback.

SVC always uses OvO to train the model when there are more than 2 classes.

The decision_function_shape hyperparameter does not affect the training strategy. It only changes what the decision_function() will output. If you set it to "ovo", then the decision_function() method will output one score per class pair (so 45 numbers when there are 10 classes). But if you set it to "ovr" (which is the default), then the decision_function() will only output one score per class (so 10 scores when there are 10 classes).

Here's what the documentation of this hyperparameter says:

decision_function_shape{‘ovo’, ‘ovr’}, default=’ovr’

Whether to return a one-vs-rest (‘ovr’) decision function of shape (n_samples, n_classes) as all other classifiers, or the original one-vs-one (‘ovo’) decision function of libsvm which has shape (n_samples, n_classes * (n_classes - 1) / 2). However, one-vs-one (‘ovo’) is always used as multi-class strategy. The parameter is ignored for binary classification.

Hope this helps.

ladylazy9x commented 3 years ago

Hi @ladylazy9x ,

Thanks for your feedback.

SVC always uses OvO to train the model when there are more than 2 classes.

The decision_function_shape hyperparameter does not affect the training strategy. It only changes what the decision_function() will output. If you set it to "ovo", then the decision_function() method will output one score per class pair (so 45 numbers when there are 10 classes). But if you set it to "ovr" (which is the default), then the decision_function() will only output one score per class (so 10 scores when there are 10 classes).

Here's what the documentation of this hyperparameter says:

decision_function_shape{‘ovo’, ‘ovr’}, default=’ovr’

Whether to return a one-vs-rest (‘ovr’) decision function of shape (n_samples, n_classes) as all other classifiers, or the original one-vs-one (‘ovo’) decision function of libsvm which has shape (n_samples, n_classes * (n_classes - 1) / 2). However, one-vs-one (‘ovo’) is always used as multi-class strategy. The parameter is ignored for binary classification.

Hope this helps.

@ageron so ovo which has 45 classifers can output 10 score like ovr? Do you know how they do that?

ageron commented 3 years ago

Yes, exactly.

The source code is available here.

In short, when you call predict() to classify an instance, Scikit-Learn calls libsvm, which makes 45 predictions (one per pair of classes, assuming there are 10 classes), and for each class k it counts the number of duels won (i.e., how many of the 10 classifiers that involve class k predicted class k), and subtracts the number of duels lost (i.e., how many of the 10 classifiers involving class k voted against class k). Let's call n_k the score that class k obtains (= #won - #lost). Whichever class gets the highest score n_k is the predicted class. In case there are ties, the first class wins.

The decision_function() is different. First, it asks libsvm for more details. It obtains all the individual classifier predictions (positive or negative) as well as the confidence score for each of them. So Scikit-Learn is able to compute n_k as well. But it also computes another score for each class, based on the classifier confidence scores: for each class k, it just sums the confidence scores the class got in each duel, whether it won or lost. Let's call this score c_k. The decision_function() method returns n_k + f(c_k) for each class. The function f is f(x) = x / (3 * (|x| + 1)). This functions outputs a number between -1/3 and +1/3. So you can think of the confidence scores as "tie breakers": if two classes won the same number of duels, then the decision_function() will give them identical n_k scores, so the c_k part will decide which class gets the highest score, and that's based on the confidence scores.

Of course, if you set decision_function_shape="ovo", then the decision_function() method simply returns the 45 confidence scores directly.

Lastly, here's a hyperparameter called break_ties in the SVC class, which defaults to False. If you set it to True (and if there are 3 classes or more, and if decision_function_shape="ovr"), then the predict() method will use the decision_function() method to make its predictions. This ensures that the confidence scores break any ties between classes. There is a performance penalty, however, which is why it defaults to False. The break_ties logic is here in the source code.

Hope this helps!

ageron / handson-ml2

[BUG] Chap 3 SVC use OvR by default instead of OvO #482