Kaleidophon / evidential-deep-learning-survey

GNU General Public License v3.0
3 stars 1 forks source link

Some questions about aleaotirc and epistemic uncertainty for EDL #1

Closed haruishi43 closed 4 months ago

haruishi43 commented 4 months ago

Your survey has helped me tremendously in understanding the landscape of EDL. I've recently started learning about uncertainty estimation for deep learning and wanted to experiment with EDL for dense classification tasks like semantic segmentation.

I started with EDL [Sensoy+ 2018] and visualized the epistemic uncertainty as well as the aleatoric and distributional uncertainties using the equations in your paper. I've noticed that all uncertainties have the same characteristics with different values, as shown in the figure below. I would think aleatoric uncertainties are higher for boundaries, while epistemic uncertainties are higher for OOD pixels/objects.

image

Do you happen to know why this is? My guess is that Sensoy's EDL is not particularly suited to produce uncertainties that can discriminate aleatoric and epistemic uncertainties.

Kaleidophon commented 4 months ago

Hey, thank you for reaching out!

I find this question a bit difficult to answer questions about this particular instance without knowing anything about the model, training data and training objective.

I can tell you that in general, the uncertainties proposed by Sensoy et al. are idealized - when you see these pretty plots of the Dirichlet distribution on the probability simplex in different situations, this is how we would like it to look, but that does not mean it is well-behaved in practice. You can imagine that simply because we predict a Dirichlet instead of a categorical using our neural network, does not mean that automatically everything would work out of the box. We need to train the network to display this behavior.

There are different strategies to achieve this though that you could try here: One is through the use of OOD examples (see the Malinin & Gales papers) to regularize the network, where you could either use different datasets, create OOD samples yourself by adding noise etc. to your input images, or even use an additional generative model to create OOD images for you. The other one is through inductive biases in the architecture, specifically the posterior network and natural posterior network by Charpentier et al., where you train a normalizing flow model on the latent representation of the underlying classifier. This helps to identify uncertain samples because their latent representations would (ideally) be assigned low probabilities under the flow model.

Hope that gave some useful pointers!

haruishi43 commented 4 months ago

Thank you for your response! I really appreciate your feedback.

I'm sorry I didn't provide my experimental setup at all. I modified Sensoy's EDL for image classification to pixel-wise classification and changed the final activation to sigmoid instead of ReLU and trained the model with cross-entropy loss that they proposed with KLdiv regularization. I used Pascal VOC as the training dataset and used their validation split to visualize the uncertainties. I've just realized that Pascal VOC may not be a good dataset since the mask boundaries are considered ignore pixels where these pixels do not contribute to the loss. I think this might have a strong influence in the uncertainty estimates! I'll try again with another dataset which does not have ignore pixels around the mask boundaries.

I also appreciate you giving some strategies to improve the uncertainty estimates. I haven't researched enough about methods which gives OOD examples during training, but I did survey methods like PostNet and NatPN which doesn't require OOD examples. I also found an impressive paper which adapted NatPN for the task of semantic segmentation called "Deep Evidential Uncertainty Estimation for Semantic Segmentation under Out-Of-Distribution Obstacles", which I think closely resembles the second strategy you gave.

Thank you again for your feedback!

Kaleidophon commented 4 months ago

I think the sigmoid might be a problem here, because the alpha values that parameterize the Dirichlet distribution are positive (unbounded) real numbers, I am afraid you might be unnecessarily restricting them to [0, 1]? That is why people parameterize them using ReLU, exponential function or softplus.

haruishi43 commented 4 months ago

sorry, I mistyped. I used exponential function as my activation. I've also tried relu and softplus with SSE, but I had trouble training the model. I could not obtain good IoU at all. It seems CE or NLL with exp seemed to train reliably for semantic segmentation (I've also seen exp(tanh(evidence) / 0.25) being used in some implementations, and that worked well as well).