Closed wsjeon closed 7 years ago
Empirically, it depends on the number of classes. For K=10, we find that a fixed temperature between 0.5 and 1.0 works pretty well. You can squeeze out some additional performance by gradually annealing the temperature to 0.5 over the course of training. If you ultimately care about discrete inference, make sure you monitor validation accuracy on quantized (i.e. hard=True) graph.
Thanks :)
Thank you for your interesting work! :)
I wonder if the temperature very close to 0 (e.g., 1e-20) makes the backpropagation error in practice.
In addition, is there a proper temperature you recommend?