Closed KacperKubara closed 2 years ago
Hi Kacper,
Thanks for your interest in our work! This is a great question! You are absolutely correct that overconfidence in neural networks is a huge issue, and that's exactly one of our motivation to resort to gradient space for OOD detection. As for your question, I have the following three points that might be relevant:
Hope this answers your question!
Best, Rui
Hi Rui, Thanks for the great answer, that makes it much clearer now!
Hi,
In the paper you make a statement on the ID and OOD data:
However, it is also a well-known fact that for OOD data, models tend to be overconfident. In this case, for OOD I would also expect the distribution of the softmax to be 'spiky'. I see that you use temperature scaling for the softmax output which can calibrate the network, but in most of the experiments, this parameter is set as 1. So I was wondering what's your take on that regarding the assumption you make that ID is expected to have larger KL divergence than OOD? The gradient estimation seems to work quite well so I suspect that I don't understand the problem fully. If you could help me clarify that, it would be great!
References on the overconfidence of the models: http://arxiv.org/abs/1610.02136 http://mi.eng.cam.ac.uk/reports/svr-ftp/evermann_stw00.pdf https://arxiv.org/abs/1906.02530
Thanks, Kacper