Open yuyiyi opened 6 years ago
The loss function in Eq. 9 of the paper is indeed the Kullback-Leibler divergence. It is computed relative to the ground truth and is non-negative. This is what is plotted in row 5 of Figure 1. However, it cannot be used in empirical studies because the ground truth is unavailable.
However, the validation loss in Eq. 10 is measured with respect to a validation sample and it omits the unavailable constant term. It is no longer guaranteed non-negative. That is what is plotted in row 6 of Figure 1. Notice that in these plots, the values are relative to those of the correct model family. The actual values are not shown.
The derivation in Eq. 11 demonstrates that minimization of the validation loss in Eq. 10 is equivalent to minimization of the true loss in Eq. 9, which is not necessarily true for many other definitions of loss.
Thanks for posting the code! It is very useful. I have some confusion about the loss function. The loss function used in this method was Kullback-Leibler divergence. It suppose to be non-negative. However, I'm getting negative loss values. Would you be able comment on that? Thank you very much!