bellymonster / Weighted-Soft-Label-Distillation

55 stars 8 forks source link

Assumption 1: a gap between "KD helps calibrate" and "KD reduces variance". #7

Closed TongLiu-github closed 2 years ago

TongLiu-github commented 3 years ago

Your work is exciting and inspiring.

But there is a huge gap between "KD helps calibrate" and "KD reduces variance", since it also could be due to bias reduction, the bias between the probability and accuracy like defined in the calibration error.

Actually as defined in Eq. 2 in Guo's calibration paper, the main reason to reduce ECE could be understood as the bias reduction of p, right?

lsongx commented 3 years ago

Hi @TongLiu-github , thanks for your interest and posting a discussion here. You are correct. Calibration and bias-variance are two concepts. Calibration uses the expectation over the test set (the overall distribution) while bias-variance uses expectation over different training sets. What we want to say here is that a better calibrated model empirically suggests that the model is less overfitted (but mathematically not related), thus the variance is smaller.

HolmesShuan commented 2 years ago

Actually as defined in Eq. 2 in Guo's calibration paper...

@TongLiu-github Hi, sorry to interrupt, Guo's calibration paper refers to?

lsongx commented 2 years ago

@HolmesShuan https://arxiv.org/abs/1706.04599

HolmesShuan commented 2 years ago

@HolmesShuan https://arxiv.org/abs/1706.04599

Thanks~