fabio-deep / Deep-Bayesian-Self-Training

Deep Bayesian Self-Training [official implementation]
9 stars 4 forks source link

questions about equations #2

Open ShellingFord221 opened 4 years ago

ShellingFord221 commented 4 years ago

Hi, I have three questions about your equations:

  1. How do you get Eq. 15? In the paper what uncertainties do we need in Bayesian deep learning for computer vision, the total uncertainty is defined as (Eq. 9)

    image

    and it is the version of regression task, not classification task. I don't know how you get your Eq. 15.

  2. In the paragraph above Eq. 17, you say that "Having calculated the predictive uncertainty var[p^hat] of our pseudo-labels". However, your Eq. 14 of calculating var[p^hat] has nothing about the label, but only the total uncertainty about the sample given model's predicted probabilities, so I don't know where the pseudo-labels come from.

  3. In Eq. 20, you add the sample whose var[p(y|x)]< tau into the training set. Is var[p(y|x)] calculated via Eq. 14 or Eq. 15? It seems that Eq. 14 and Eq. 15 are two different ways to calculate total uncertainty, maybe one is just enough.

Thanks!

fabio-deep commented 4 years ago

Hi, apologies for the delay, I don't check this repo often!

  1. Eq. (15) has an epistemic and an aleatoric uncertainty term, and there are a few ways of measuring these quantities. In general, epistemic uncertainty can be measured via the entropy of the average output softmax distribution over T samples. For example, see Eq. (3) and text just after in the paper you mentioned. The aleatoric term is based on section 3.3 in their paper.

  2. This is a misunderstanding of terminology, let me try to clarify. The predictions made by the network become pseudo-labels in the next self-training iteration, i.e. we use them as if they were actual labels to retrain the network. In that sense, a prediction is a pseudo-label, but they're only added to the self-training pool if the model is quite sure about the predicted label (low predictive uncertainty)

  3. Correct, they are simply two different ways of measuring uncertainty in the predictions. Feel free to use either one as I provide the code for both!

Regards, Fabio