Open haoming-codes opened 1 year ago
I can also use your help in reading the guided sampling algorithm, Algorithm 3. Does the angled bracket $\langle , \rangle$ in the "Guidance distribution" step represent inner product? If so, does prob_X_unnormalized = p_eta_x * prob_X
implement this inner product, or does it implement the "Reverse process" step? Thank you!
Hello, the brackets denote an inner product, yes. If you want to check the probability of one particular graph $G^{t-1}$, you need to compute this inner product (the probability is just a scalar). However, in the code we don't look at one particular $G^{t-1}$, but consider the probability tensor $p_{eta-x} * prob_X$.
For the first question, you need to observe that there is no reason to believe that $p_\eta$ factorizes as a product over nodes and edges. It means that you need to estimate it separately for any possible graph. In contrast, the first order approximation factorizes as a product, which makes it efficient to compute. I don't know if there are other efficient mechanisms, but it's an interesting thing to look at.
Thank you Clement. Do you mean that prob_X_unnormalized = p_eta_x * prob_X
implements both the inner product $\langle \nabla_{G^t}, G^{t-1} \rangle$ and $p(G^{t-1}|G^t)p(y|G^{t-1})$, or do you mean that the inner product does not need to be implemented? I'm sorry if I missed something.
It's not very easy to explain, but the inner product does not need to be implemented in practice. What we really care about is the tensor of the gradients, which is computed using p_eta_x
Hi Clement, In sec 5 of the paper, you mentioned "pη cannot be evaluated for all possible values of $G{t-1}$". Is this referring to the fact that $G_{t-1}$ is a probabilistic graph, but p_η was trained on one-hot graphs?
Are there any other method (alternative to your proposed 1st-order approximation) to do classifier-guided discrete diffusion that you are aware of? Is straight-through estimation a viable option? Thank you!