algorithmic such as SVM or Random Forests. In this case the scikit predict_proba function is sometimes used as a measure of confidence
Neural Networks such as NNs ending with sigmoid layer mapping to class. In this case the output of the sigmoid function is sometimes used as a measure of confidence
PS, in anomaly detection if the output of the confidence is sufficiently low, the datapoint can be considered an anomaly.
Both types of classifiers however sometimes skew the results of their confidence measures (1)(2)
In order to resolve this, probability calibration is sometimes necessary. Basically the output of each classifier's confidence is passed through a regressor that has been trained on the predicted vs actual confidence. In other words:
1. The classifier is trained as usual2. For each output class, create a regressor (isotonic or normal)
3. For every training sample, pass it through the model, and for each output class record the predicted probability and the actual probability (which is usually 0, or 1)4. Train the regressors created in step 2 on the data collected in step 3
Classifiers in this note refer to:
predict_proba
function is sometimes used as a measure of confidencePS, in anomaly detection if the output of the confidence is sufficiently low, the datapoint can be considered an anomaly.
Both types of classifiers however sometimes skew the results of their confidence measures (1)(2)
In order to resolve this, probability calibration is sometimes necessary. Basically the output of each classifier's confidence is passed through a regressor that has been trained on the predicted vs actual confidence. In other words:
1. The classifier is trained as usual
2. For each output class, create a regressor
(isotonic or normal)3. For every training sample, pass it through the model, and for each output class record the predicted probability and the actual probability (which is usually 0, or 1)
4. Train the regressors created in step 2 on the data collected in step 3