Is it accuracy - or is it the relative frequency of positive examples in the bin?

Dear Fabian,

Thank you for the time you put into this repo and for open sourcing your code!

I have never used netcal before, and so I found myself comparing it to other libraries/pieces of code that do similar things. Concerning the visualisation function(s), specifically netcal.presentation.ReliabilityDiagram, I was wondering: is the quantity you plot on the y axis really the accuracy, or is it the relative frequency of positive examples in each bin (as, from my understanding, it should be in calibration curves)?

Checking the code here, in particular this snippet:

for batch_X, batch_matched, batch_hist, batch_median in zip(X, matched, histograms, median_confidence):
            acc_hist, conf_hist, _, num_samples_hist = batch_hist
            empty_bins, = np.nonzero(num_samples_hist == 0)

            # calculate overall mean accuracy and confidence
            mean_acc.append(np.mean(batch_matched))
            mean_conf.append(np.mean(batch_X))

assuming batch_matched stores the ground truth labels for each batch, I am pretty confident that should not be named "accuracy" (still - I confess I have not spent a lot of time trying to understanding perfectly what the various function should return).

I have also tried to compare the results from netcal with scikit-learn calibration_curve function - whose documentation state returns "the proportion of samples whose class is the positive class, in each bin (fraction of positives)", and the results look very similar, if not identical, to what I get with netcal.

It would be amazing if you could clarify this!

Cheers, Dennis.

EFS-OpenSource / calibration-framework

Is it accuracy - or is it the relative frequency of positive examples in the bin? #27