GuoQiushan / EGC

MIT License
38 stars 1 forks source link

Question about derivation of Eq.13 in the paper #4

Open jinhong-ni opened 10 months ago

jinhong-ni commented 10 months ago

Thanks for open-sourcing your great work. I'm slightly confused about the derivation of Eq.13 in the paper, which derives the joint distribution as: $$p\theta(x,y)=\frac{\exp(f\theta(x)[y])}{Z(\theta)}$$ from the classifier $p\theta(y|x)=\frac{\exp(f\theta(x)[y])}{\sum{y'}\exp(f\theta(x)[y'])}$ and the marginal $p\theta(x)=\frac{\exp(-E\theta(x))}{Z(\theta)}$. At my first glance, it seems like this equation only holds if you have $\sum{y'}\exp(f\theta(x)[y'])=\exp(-E_\theta(x))$. However, according to [1], it seems like this is actually what they derive from the joint distribution, i.e., they derive the energy-based generative model from the classifier.

I am a little confused about this formulation and would really appreciate if you could please provide a full derivation of how the joint distribution is derived. Thanks again for releasing your work and in advance for your clarification.

Reference [1] Grathwohl, Will et al. “Your Classifier is Secretly an Energy Based Model and You Should Treat it Like One.” ArXiv abs/1912.03263 (2019): n. pag.

zhangchbin commented 8 months ago

Could you provide some insights on how to derive the joint distribution? Thanks for your help! @GuoQiushan