feature request: model calibration

Gym does not currently carry out model calibration, but this would be a nice feature. Softmax scores are not true probability distributions, and class imbalance can greatly affect the accuracy of individual classes. This leads to overconfident models.

The purpose of model calibration is to provide an unbiased estimate of model uncertainty by making the relationship between mean accuracy and proportion of false positives linear (i.e. make uncertainty a monotonic function of true positives)

in some segmentation contexts this may be referred to as 'temperature scaling' or 'local temperature scaling', because it tries to adjust the 'heat' of each class in a probability map

Some potential resources:

https://scikit-learn.org/stable/auto_examples/calibration/plot_calibration_curve.html#sphx-glr-auto-examples-calibration-plot-calibration-curve-py

https://github.com/dwang181/selectivecal

https://github.com/uncbiag/LTS

Doodleverse / segmentation_gym

feature request: model calibration #141