Closed hadyelsahar closed 4 years ago
import numpy as np
def softmax(X, temp): scores = np.exp([x/temp for x in X]) return scores / sum(scores)
for temp in [1, 10, 100, 1000]: print(f'For temp={temp} the scores from softmax are {softmax([10, 1, 1], temp)}')
The output is:
For temp=1 the scores from softmax are [9.99753241e-01 1.23379352e-04 1.23379352e-04] For temp=10 the scores from softmax are [0.5515296 0.2242352 0.2242352] For temp=100 the scores from softmax are [0.353624 0.323188 0.323188] For temp=1000 the scores from softmax are [0.33533632 0.33233184 0.33233184]
sorry for missing the discussion but here's my take on the paper:
1- Calibration is has a long history and currently is an active research area. You can find a collection of references following the on the last slide here: https://docs.google.com/presentation/d/1KMP5Aptu6we42GCRMcHbvhOUlq1vYE4SqZnCujMMPTU/edit?usp=sharing
Calibration has also connections with Bayesian DL in terms of uncertainity calculation and it is mainly used for applications such as Out of distribution OOD detection, and robustness against adversarial attacks.
2- note: temp scaling doesn't change accuracy of models, like it will not increase robustness since argmax of the logits will always stay the same if all divided by the same constant. other forms like plat scaling could affect the max since its bilinear but temp scaling not. for a good reference check guo et al paper: https://arxiv.org/pdf/1706.04599.pdf
3- reliability diagrams can give a poor impression of ECE when the bins are not balanced, since ECE are sample normalized and reliability diagrams aren't.
That is why its a common practice to pair this with a second diagram that shows no. of samples in each bin. check this scitkit tutorial: https://scikit-learn.org/stable/auto_examples/calibration/plot_calibration_curve.html#sphx-glr-auto-examples-calibration-plot-calibration-curve-py
4- for a followup work check CALIBRATION OF ENCODER DECODER MODELS FOR NEURAL MACHINE TRANSLATION https://arxiv.org/pdf/1903.00802.pdf
5- There are some recent work that showed that modeling the generative and discriminative process together in a classifier can help a lot the calibration of the resulting discriminative model the combination of such can be seen as an "Energy based model" There's an interesting paper about that by Will Grathwohl YOUR CLASSIFIER IS SECRETLY AN ENERGY BASED MODEL AND YOU SHOULD TREAT IT LIKE ONE video
Calibration of Pre-trained Transformers https://arxiv.org/abs/2003.07892
participation link is available on the slack channel.
Abstract: