Open ladylazy9x opened 3 years ago
Hi @ladylazy9x ,
Thanks for your feedback.
SVC always uses OvO to train the model when there are more than 2 classes.
The decision_function_shape
hyperparameter does not affect the training strategy. It only changes what the decision_function()
will output. If you set it to "ovo"
, then the decision_function()
method will output one score per class pair (so 45 numbers when there are 10 classes). But if you set it to "ovr"
(which is the default), then the decision_function()
will only output one score per class (so 10 scores when there are 10 classes).
Here's what the documentation of this hyperparameter says:
decision_function_shape{‘ovo’, ‘ovr’}, default=’ovr’
- Whether to return a one-vs-rest (‘ovr’) decision function of shape (n_samples, n_classes) as all other classifiers, or the original one-vs-one (‘ovo’) decision function of libsvm which has shape (n_samples, n_classes * (n_classes - 1) / 2). However, one-vs-one (‘ovo’) is always used as multi-class strategy. The parameter is ignored for binary classification.
Hope this helps.
Hi @ladylazy9x ,
Thanks for your feedback.
SVC always uses OvO to train the model when there are more than 2 classes.
The
decision_function_shape
hyperparameter does not affect the training strategy. It only changes what thedecision_function()
will output. If you set it to"ovo"
, then thedecision_function()
method will output one score per class pair (so 45 numbers when there are 10 classes). But if you set it to"ovr"
(which is the default), then thedecision_function()
will only output one score per class (so 10 scores when there are 10 classes).Here's what the documentation of this hyperparameter says:
decision_function_shape{‘ovo’, ‘ovr’}, default=’ovr’
- Whether to return a one-vs-rest (‘ovr’) decision function of shape (n_samples, n_classes) as all other classifiers, or the original one-vs-one (‘ovo’) decision function of libsvm which has shape (n_samples, n_classes * (n_classes - 1) / 2). However, one-vs-one (‘ovo’) is always used as multi-class strategy. The parameter is ignored for binary classification.
Hope this helps.
@ageron so ovo which has 45 classifers can output 10 score like ovr? Do you know how they do that?
Yes, exactly.
The source code is available here.
In short, when you call predict()
to classify an instance, Scikit-Learn calls libsvm, which makes 45 predictions (one per pair of classes, assuming there are 10 classes), and for each class k it counts the number of duels won (i.e., how many of the 10 classifiers that involve class k predicted class k), and subtracts the number of duels lost (i.e., how many of the 10 classifiers involving class k voted against class k). Let's call nk the score that class k obtains (= #won - #lost). Whichever class gets the highest score nk is the predicted class. In case there are ties, the first class wins.
The decision_function()
is different. First, it asks libsvm for more details. It obtains all the individual classifier predictions (positive or negative) as well as the confidence score for each of them. So Scikit-Learn is able to compute nk as well. But it also computes another score for each class, based on the classifier confidence scores: for each class k, it just sums the confidence scores the class got in each duel, whether it won or lost. Let's call this score ck. The decision_function()
method returns nk + f(ck) for each class. The function f is f(x) = x / (3 * (|x| + 1)). This functions outputs a number between -1/3 and +1/3. So you can think of the confidence scores as "tie breakers": if two classes won the same number of duels, then the decision_function()
will give them identical nk scores, so the ck part will decide which class gets the highest score, and that's based on the confidence scores.
Of course, if you set decision_function_shape="ovo"
, then the decision_function()
method simply returns the 45 confidence scores directly.
Lastly, here's a hyperparameter called break_ties
in the SVC
class, which defaults to False
. If you set it to True
(and if there are 3 classes or more, and if decision_function_shape="ovr"
), then the predict()
method will use the decision_function()
method to make its predictions. This ensures that the confidence scores break any ties between classes. There is a performance penalty, however, which is why it defaults to False
. The break_ties
logic is here in the source code.
Hope this helps!
OvR has 10 scores
OvO has 45 scores
But the book say OvO and show 10 scores.