EFS-OpenSource / calibration-framework

The net:cal calibration framework is a Python 3 library for measuring and mitigating miscalibration of uncertainty estimates, e.g., by a neural network.
https://efs-opensource.github.io/calibration-framework/
Apache License 2.0
344 stars 42 forks source link

Is it accuracy - or is it the relative frequency of positive examples in the bin? #27

Closed denbonte closed 2 years ago

denbonte commented 2 years ago

Dear Fabian,

Thank you for the time you put into this repo and for open sourcing your code!

I have never used netcal before, and so I found myself comparing it to other libraries/pieces of code that do similar things. Concerning the visualisation function(s), specifically netcal.presentation.ReliabilityDiagram, I was wondering: is the quantity you plot on the y axis really the accuracy, or is it the relative frequency of positive examples in each bin (as, from my understanding, it should be in calibration curves)?

Checking the code here, in particular this snippet:

for batch_X, batch_matched, batch_hist, batch_median in zip(X, matched, histograms, median_confidence):
            acc_hist, conf_hist, _, num_samples_hist = batch_hist
            empty_bins, = np.nonzero(num_samples_hist == 0)

            # calculate overall mean accuracy and confidence
            mean_acc.append(np.mean(batch_matched))
            mean_conf.append(np.mean(batch_X))

assuming batch_matched stores the ground truth labels for each batch, I am pretty confident that should not be named "accuracy" (still - I confess I have not spent a lot of time trying to understanding perfectly what the various function should return).

I have also tried to compare the results from netcal with scikit-learn calibration_curve function - whose documentation state returns "the proportion of samples whose class is the positive class, in each bin (fraction of positives)", and the results look very similar, if not identical, to what I get with netcal.

It would be amazing if you could clarify this!

Cheers, Dennis.

fabiankueppers commented 2 years ago

Hi Dennis, sorry for the late response. Yes, you're right, the reliability diagram visualizes the fraction of positive samples within each bin.

Best, Fabian