Approximation of regressor guidance

cvignac / DiGress

code for the paper "DiGress: Discrete Denoising diffusion for graph generation"

MIT License

364 stars 76 forks source link

Approximation of regressor guidance #45

Open haoming-codes opened 1 year ago

haoming-codes commented 1 year ago

Hi Clement, In sec 5 of the paper, you mentioned "pη cannot be evaluated for all possible values of $G{t-1}$". Is this referring to the fact that $G_{t-1}$ is a probabilistic graph, but p_η was trained on one-hot graphs?

Are there any other method (alternative to your proposed 1st-order approximation) to do classifier-guided discrete diffusion that you are aware of? Is straight-through estimation a viable option? Thank you!

haoming-codes commented 1 year ago

I can also use your help in reading the guided sampling algorithm, Algorithm 3. Does the angled bracket $\langle , \rangle$ in the "Guidance distribution" step represent inner product? If so, does prob_X_unnormalized = p_eta_x * prob_X implement this inner product, or does it implement the "Reverse process" step? Thank you!

cvignac commented 1 year ago

Hello, the brackets denote an inner product, yes. If you want to check the probability of one particular graph $G^{t-1}$, you need to compute this inner product (the probability is just a scalar). However, in the code we don't look at one particular $G^{t-1}$, but consider the probability tensor $p_{eta-x} * prob_X$.

For the first question, you need to observe that there is no reason to believe that $p_\eta$ factorizes as a product over nodes and edges. It means that you need to estimate it separately for any possible graph. In contrast, the first order approximation factorizes as a product, which makes it efficient to compute. I don't know if there are other efficient mechanisms, but it's an interesting thing to look at.

haoming-codes commented 1 year ago

Thank you Clement. Do you mean that prob_X_unnormalized = p_eta_x * prob_X implements both the inner product $\langle \nabla_{G^t}, G^{t-1} \rangle$ and $p(G^{t-1}|G^t)p(y|G^{t-1})$, or do you mean that the inner product does not need to be implemented? I'm sorry if I missed something.

cvignac commented 1 year ago

It's not very easy to explain, but the inner product does not need to be implemented in practice. What we really care about is the tensor of the gradients, which is computed using p_eta_x