EFS-OpenSource / calibration-framework

The net:cal calibration framework is a Python 3 library for measuring and mitigating miscalibration of uncertainty estimates, e.g., by a neural network.
https://efs-opensource.github.io/calibration-framework/
Apache License 2.0
347 stars 42 forks source link

Basic binary classification case #17

Closed jwitos closed 3 years ago

jwitos commented 3 years ago

Hi, I'm having problems understanding what's the proper use of the library for a very simple binary classifier. I have a 1-D array of binary labels {0, 1} and a 1-D array of model predictions with probability values p in range (0, 1). Those values reflect the probability of a positive class.

Plugging those values into e.g. the reliability diagram, I got the following plot: image Confidence histogram makes sense to me, as most samples are negative and classifier correctly assigns a low probability. But I'm not sure how to interpret the reliability diagram -- what do the dark red bars suggest here? Also, ECE I received is very high (>0.8).

I tried to reverse the probabilities for negative samples, i.e. if a label is 0, then the probability is (1-p). This gives a more justifiable plot: image

Could you confirm that for negative samples the probability should reflect probability of a negative class, not the positive class, even in a binary classification case?

Also, it might be worth clarifying that the confidence estimates for some functions (e.g. Platt's / temperature scaling) are supposed to be in the prediction space and not logit space. After reading official papers and implementations it might be confusing because conversion prediction -> logit is done behind the scenes, and information in docs about this would be helpful.

fabiankueppers commented 3 years ago

Hi @jwitos, thank you very much for your comment that helped me to find a serious bug in the code that prevents the method from correctly identifying a binary classification scenario. In a binary setting, the reliability diagram should always indicate the accuracy/frequency for the positive class without any modification to the confidence scores passed to the method. The same holds for the ECE. Therefore, simply pass your 1-D {0, 1} ground-truth array and your 1-D confidence array to the method. I will upload the bugfix as soon as possible. Thanks!

fabiankueppers commented 3 years ago

New version 1.2.1 is now available.

jwitos commented 3 years ago

Awesome, thanks a lot @fabiankueppers. I'll test it out when I have a chance and report back.