according to the paper, where $\sigma^2_i(\theta)$ is the posterior variance evaluated on the batch element $i$. However, this seems like it might be causing an issue that trips up the optimizer. If the posterior variance is $\ll 1.0$, then the $\log (\sigma^2_i(\theta))$ term can be (very) negative. Is this intended? It can cause the resultant obj_fn to have the opposite sign of what the code expects. I am not sure if this is a bug or not.
The
lool
loss is implemented as$$ \sum_{i \in B} \log (\sigma^2_i(\theta)) + \frac{(Y(x_i) - \mu_i(\theta))^2}{\sigma^2_i(\theta)} $$
according to the paper, where $\sigma^2_i(\theta)$ is the posterior variance evaluated on the batch element $i$. However, this seems like it might be causing an issue that trips up the optimizer. If the posterior variance is $\ll 1.0$, then the $\log (\sigma^2_i(\theta))$ term can be (very) negative. Is this intended? It can cause the resultant
obj_fn
to have the opposite sign of what the code expects. I am not sure if this is a bug or not.