Closed HolmesShuan closed 3 years ago
Hi @HolmesShuan , thanks for your question! The reason is that E(y) and E(\bar{y}) are all equals to 1. Please refer to the equation above Eqn 4 from Heskes's paper: an extra \bar{p}(y) in the second row. This follows the definition used in Eqn 2 from Heskes's paper. In their paper, it is defined \int dy a(y) = 1. (constraints in Eqn 2) In our notations, y becomes x and a() becomes y=t(). We originally kept all notations the same as Heskes's paper, but a reviewer thinks the notations are not appropriate thus we changed all notations. Sorry for the confusion! Please let me know if there are any further questions.
Hi @LcDog , thanks for your quick reply!
I have double-checked the derivation of Heskes's paper and gradually understood the decomposition for the expected error defined on the KL Div. Unfortunately, I still did not get the idea of how E(y) and E(\bar{y}) are all equals to 1
leads to
I am really sorry for bothering you again :(
As far as I can tell,
I am not sure how to use E(y) and E(\bar{y}) are all equals to 1
in the above equation. It seems that
Many thanks!
Cool! The derivation is very clear!
The notation is ok but I found E[y]=1 and E[\overline y]=1
quite confusing, which makes me wonder is there any
tricks in Equation(2). :sweat_smile:
By the way, could you further explain how to get the equation in your first comment?
As for I just quoted the conclusion in the ICLR paper "The derivation of the variance term is based on the facts that $\frac{\log \overline{y}{ce}}{\mathbb{E}\mathcal{D}[\log \hat{y}_{ce}]}$ is a constant and ...".
Thanks for your responses!
Hi, I have read your excellent work several times. The bias-variance idea is very interesting!
However, due to my poor knowledge of "bias/variance" theorem, I found the variance term in Equation (2) hard to understand. E.g., How to prove that
I have referred to Heskes's paper and it seems that the derivation relies on the normalization constant Z in Equation (1). It is easy to prove that if Z in Equation (1) is a constant value. But I still did not find the relationship between this term and Equation (2).
Could you please kindly provide the detailed derivations of the variance term in Equation (2)? Thanks in advance.