Closed jwitos closed 3 years ago
Hi @jwitos, thank you very much for your comment that helped me to find a serious bug in the code that prevents the method from correctly identifying a binary classification scenario. In a binary setting, the reliability diagram should always indicate the accuracy/frequency for the positive class without any modification to the confidence scores passed to the method. The same holds for the ECE. Therefore, simply pass your 1-D {0, 1} ground-truth array and your 1-D confidence array to the method. I will upload the bugfix as soon as possible. Thanks!
New version 1.2.1 is now available.
Awesome, thanks a lot @fabiankueppers. I'll test it out when I have a chance and report back.
Hi, I'm having problems understanding what's the proper use of the library for a very simple binary classifier. I have a 1-D array of binary labels {0, 1} and a 1-D array of model predictions with probability values p in range (0, 1). Those values reflect the probability of a positive class.
Plugging those values into e.g. the reliability diagram, I got the following plot: Confidence histogram makes sense to me, as most samples are negative and classifier correctly assigns a low probability. But I'm not sure how to interpret the reliability diagram -- what do the dark red bars suggest here? Also, ECE I received is very high (>0.8).
I tried to reverse the probabilities for negative samples, i.e. if a label is 0, then the probability is (1-p). This gives a more justifiable plot:
Could you confirm that for negative samples the probability should reflect probability of a negative class, not the positive class, even in a binary classification case?
Also, it might be worth clarifying that the confidence estimates for some functions (e.g. Platt's / temperature scaling) are supposed to be in the prediction space and not logit space. After reading official papers and implementations it might be confusing because conversion prediction -> logit is done behind the scenes, and information in docs about this would be helpful.