ethen8181 / machine-learning

:earth_americas: machine learning tutorials (mainly in Python3)
MIT License
3.17k stars 650 forks source link

Minor calculation mistake in "compute_calibration_error" #20

Closed user9517 closed 2 years ago

user9517 commented 2 years ago

The formula for ECE (expected calibration error) includes the size of each bin as weight in the weighted average of the squared errors (|Bm|/n)

The function that uses this formula in the code is called "compute_calibration_error": https://github.com/ethen8181/machine-learning/blob/master/model_selection/prob_calibration/calibration_module/utils.py#L66

(Link to the code line that sums the errors without weight for each bin size)

Although the bins are created so that they are of approximately equal size, they might differ slightly, and the code does not take this into account, i think the bin_error should be multiplied by the bin size, and the sum of all the errors divided by the number of samples (len of y_true for example) instead of the number of bins (in line 68).

I hope my issue is clear and easy to understand, if not, feel free to ask me for clarification.

ethen8181 commented 2 years ago

@Phantom1472 makes total sense, thanks for spotting this. Would you like to make a PR to resolve this issue?

user9517 commented 2 years ago

@Phantom1472 makes total sense, thanks for spotting this. Would you like to make a PR to resolve this issue?

For sure, I'll fix it in a bit and open a PR, thanks for the reply! (And for the code BTW, helped me out A LOT!)

user9517 commented 2 years ago

Here is the PR: https://github.com/ethen8181/machine-learning/pull/21#issue-1363656245

@ethen8181

ethen8181 commented 2 years ago

merged, feel free to close this issue.