ageron / handson-ml

⛔️ DEPRECATED – See https://github.com/ageron/handson-ml3 instead.
Apache License 2.0
25.2k stars 12.92k forks source link

What exactly are confidence scores? #91

Closed cuent closed 7 years ago

cuent commented 7 years ago

Just having some troubles to understand a decision function in chapter 3. First of all, a classifier makes its decision based on a decision function. So, how is exactly this function? Also, the classifier makes a decision based on the score, if it's greater than a certain threshold is positive. Otherwise, it is negative. How should we vary this threshold? Why SGDClassifier uses a threshold of 0? What do this scores mean? Are higher/lower better? Do they have a range?

ageron commented 7 years ago

Hi @cuent, thanks for your question. In a binary classifier, the decision function is the function that produces a score for the positive class. In a logistic regression classifier, that decision function is simply a linear combination of the input features. If that score is greater than some threshold that you choose, then the classifier "predicts" the positive class, or else it predicts the negative class. If you want your model to have high precision (at the cost of a low recall), then you must set the threshold pretty high. This way, the model will only predict the positive class when it is absolutely certain. For example, you may want this if the classifier is selecting videos that are safe for kids: it's better to err on the safe side. Conversely, if you want high recall (at the cost of a low precision) then you must use a low threshold. For example, if the classifier is used to detect intruders in a nuclear plant, then you probably want to detect all actual intruders, even if it means getting a lot of false alarms (called "false positives"). If you make a few assumptions about the distribution of the data (i.e., the positive and negative class are separated by a linear boundary plus Gaussian noise), then computing the logistic of the score gives you the probability that the instance belongs to the positive class. A score of 0 corresponds to the 50% probability. So by default, a LogisticClassifier predicts the positive class if it estimates the probability to be greater than 50%. In general, this sounds like a reasonable default threshold, but really it all depends on what you want to do with the classifier. If the assumptions I mentioned above were perfect, then if the Logistic Classifier outputs a probability of X% for an instance, it means there is exactly X% chance that it's positive. But in practice, the assumptions are imperfect, so I try to always make it clear that we are talking about an "estimated probability", not an actual probability. I hope this helps.

cuent commented 7 years ago

Thanks so much, great explanation.

MLDSBigGuy commented 6 years ago

@ageron, thanks for the explanation.

In chapter 3, multiclass classification the output Out[54]: shows the -ve and +ve values array([[-311402.62954431, -363517.28355739, -446449.5306454 , -183226.61023518, -414337.15339485, 161855.74572176, -452576.39616343, -471957.14962573, -518542.33997148, -536774.63961222]]).

Does this value 161855.74572176 is going to be picked when i call predict() ?

I have a train model over some random dataset. when i try perform decision_function(untrained sample), the predict() function is showing me the best negative value.

Whereas i want to eliminate this untrained sample rather than showing the closest label. Could we just keep the threshold to accept only the positive values and later do the predict() function for better accuracy ?

Will this work fine in all cases for eliminating the untrained samples matched to one in trained ones when i do predict function() later threshold setup ?

Thank you,

ageron commented 6 years ago

Hi @MLDSBigGuy ,

Since the max score returned by decision_function() is at index 5, the predict() function will indeed return 5. Basically predict() returns np.argmax(scores, axis=1) where scores is the output of the decision_function() function.

Even if all the values returned by decision_function() are negative, the predict() method will always return the index of the max value.

The approach you suggest sounds reasonable to me, but you would have to implement it manually, something like this (untested):

scores = model.decision_function(X_new)
y_pred = np.argmax(scores, axis=1)
threshold = 0
invalid = (np.max(scores, axis=1) < threshold)
y_pred[invalid] = -1

This gives you a set of class predictions, with -1 when none of the classes were above a given threshold.

Hope this helps, Aurélien

jamespreed commented 4 years ago

What are the units of the decision function output? I am trying to understand if is it appropriate to pass the output of a decision_function through scipy.special.softmax(..., axis=1) to get a class probability distribution.