Question about the theory in the paper

Frankie123421 commented 2 years ago

Hi Xu,

First of all thanks for your nice work. I've read your paper, and I have some questions on the proof of the equivariance of the transition kernel. In detail, suppose $\mathcal{C}^t$ is roto-translation invariant, and (thus) $\mu_{\theta}(\mathcal{C}^t, \mathcal{G}, t)$ is roto-translation equivariant with desgined GNN, we need to prove that $p(\mathcal{C}^{t-1} | \mathcal{C}^{t}, \mathcal{G}, t)$ is equivariant. I wonder if it is due to the following derivation:

$$\begin{aligned} p(R \mathcal{C}^{t-1} + g | R \mathcal{C}^{t} + g, \mathcal{G}, t) &= \mathcal{N}(R\mathcal{C}^{t-1} + g; \boldsymbol{\mu}_{\theta}(R\mathcal{C}^t + g, \mathcal{G}, t), \sigmat^2 \mathbf{I}) \ &= \frac{1}{(2 \pi)^{\frac{p}{2}}|\boldsymbol{\Sigma}|^{-\frac{1}{2}}} e^{-\frac{1}{2}(R(\mathcal{C}^{t-1}-\boldsymbol{\mu}\theta))^T \boldsymbol{\Sigma}^{-1}(R(\mathcal{C}^{t-1}-\boldsymbol{\mu}\theta))} \ &= \frac{1}{(2 \pi)^{\frac{p}{2}}|\boldsymbol{\Sigma}|^{-\frac{1}{2}}} e^{-\frac{1}{2}(\mathcal{C}^{t-1}-\boldsymbol{\mu}\theta)^T \boldsymbol{\Sigma}^{-1}(\mathcal{C}^{t-1}-\boldsymbol{\mu}_\theta)} \end{aligned}$$

where $\boldsymbol{\Sigma} = \sigma_t^2 \mathbf{I}$. I am not sure if it's correct, hope to receive your clarification. Thanks.

MinkaiXu commented 2 years ago

The derivation you showed seems correct. But I think there are some minor miss-points:

Since we only consider CoM-free systems, $C^t$ and $\mu_{\theta}(\mathcal{C}^t, \mathcal{G}, t)$ are both translation-invariant.
And they should be both rotationally equivariant.

I feel like your derivation is actually based on the above statements. Ping me if you still have any questions.

Frankie123421 commented 2 years ago

Thanks for your kind and prompt reply. I have some new questions based on your response.

My current understanding is that if we ensure that the initial density $p(xT)$ is rotation and translation invariant (from moving to CoM-free systems and keep isotropic Gaussian), and $p(x{t-1}|x{t})$ is rotation and translation invariant, then $p(x{T-1}), p(x_{T-2}), \dots, p(x_0)$ will naturally all be rotation and translation invariant in a step-by-step manner. (It is correct?)
What do you mean by "And they should be both equivariant", rotationally?
But if 1 is correct, then

$$ \mu_\theta(\mathcal{C}^t, \mathcal{G}, t)=\frac{1}{\sqrt{\alpha_t}}(\mathcal{C}^t-\frac{\beta_t}{\sqrt{1-\bar{\alpha}t}} \epsilon\theta(\mathcal{G}, \mathcal{C}^t, t)) $$

will be rotation and translation invariant no matter how $\epsilon{\theta}$ is designed since $R\mathcal{C}^t + g = \mathcal{C}^t$, so I doubt that 1 is not correct. And my original derivation is actually base on $\mu\theta(R\mathcal{C}^t + g, \mathcal{G}, t) = R\mu_\theta(\mathcal{C}^t, \mathcal{G}, t) + g$.

I am quite confused now )-: . Really looking forward to your answer. Thanks in advance.

MinkaiXu commented 2 years ago

I think the statement is correct.
Yes, I have updated my reply :)
Sorry but I'm also kind of confused about your statement...

Firstly, $R\mathcal{C}^t + g \neq \mathcal{C}^t$. By considering CoM free system, we can only have $R\mathcal{C}^t + g = R\mathcal{C}^t$ (since we always move all conformations to zero CoM). Second, similarly, $\mu\theta(R\mathcal{C}^t + g, \mathcal{G}, t) = R\mu\theta(\mathcal{C}^t, \mathcal{G}, t)$. And besides, I didn't fully get the paradox to point 1. As I say, $C^t$ is rotationally equivariant, and $\epsilon$ should also be equivariant for making $\mu$ equivariant.

Frankie123421 commented 2 years ago

Thanks for your reply. I think the main point is that my statement in 1 says $\mathcal{C}^t$ is rotationally invariant, but as what you said above, $\mathcal{C}^t$ is actually rotationally equivariant. I wonder why and doesn't it reveal that 1 is wrong? My understanding is that because $\mathcal{C}^t$ are always isotropic Gaussian, they are rotationally invariant rather than equivariant. I think I must misunderstand something. Sorry for confusing you, and also looking forward to your answer.

MinkaiXu commented 2 years ago

$C^t$ is data, not distribution, e.g., it is an $N \times 3$ tensor. With rotation, the tensor will also rotate. Invariance is the property of the Gaussian distribution, not the data (tensor) itself.

Frankie123421 commented 2 years ago

Thanks for your reply. I kind of got it now, but still have some questions. Before raising them, I would like to carefully ask that:

how to ensure that $\mathcal{C}^{t-1}$ is also in the CoM-free system? Is it induced from previous $\mathcal{C}^{t}$ (in the CoM-free system) and equivariant Markov transition kernel? Since as $\mu\theta(R\mathcal{C}^t + g, \mathcal{G}, t) = R\mu\theta(\mathcal{C}^t, \mathcal{G}, t)$, and if $R\mathcal{C}^{t-1} + g \neq R\mathcal{C}^{t-1}$, we can't get $R \mathcal{C}^{t-1} + g - \boldsymbol{\mu}{\theta}(R\mathcal{C}^t + g, \mathcal{G}, t) = R(\mathcal{C}^{t-1}-\boldsymbol{\mu}\theta)$ as the previous derivation, and thus the kernel is even not equivariant, except that $R\mathcal{C}^{t-1} + g = R\mathcal{C}^{t-1}$ is pre-known.

I am quite confused about the logic here.

MinkaiXu commented 2 years ago

For ensuring $C^t$ to be a CoM-free system, actually one can just always move CoMs of any $C$ to zero, making translation-invariant an intrinsic property of $C$.

Then yes, as I have explained before, we have:

Firstly, $R\mathcal{C}^t + g \neq \mathcal{C}^t$. By considering CoM free system, we can only have $R\mathcal{C}^t + g = R\mathcal{C}^t$ (since we always move all conformations to zero CoM). Second, similarly, $\mu\theta(R\mathcal{C}^t + g, \mathcal{G}, t) = R\mu\theta(\mathcal{C}^t, \mathcal{G}, t)$.

So the derivation is just simply $(R \mathcal{C}^{t-1} + g) - \boldsymbol{\mu}_{\theta}(R\mathcal{C}^t + g, \mathcal{G}, t) = R(\mathcal{C}^{t-1}-\boldsymbol{\mu}\theta)$.

Frankie123421 commented 2 years ago

Thanks for your response. Overall I see $\mathcal{C}^{t-1}$ is indeed in CoM-free system. Actually what I concerned about above is that, for example, considering $\mathcal{C}^T$ and $\mathcal{C}^{T-1}$, we've sample $\mathcal{C}^T$ from isotropic Gaussian and move it to CoM-free system, and could the next step $\mathcal{C}^{T-1}$ be naturally ensured to be in CoM-free system by the Markov transition kernel without any other operation? Now to my understanding (and with the help of your answer) it could not and this is actually achieved by sampling $\mathcal{C}^{T-1}$ from $p(\mathcal{C}^{T-1}|\mathcal{C}^T)$ and then move it to CoM-free system. (?)
How could I mathematically prove that CoM-free system is translationally-invariant?
Could the proof of $R \mathbf{x}^{l+1}, \mathbf{h}^{l+1}=\operatorname{GFN}\left(R \mathbf{x}^l, R \mathcal{C}+g, \mathbf{h}^l\right)$ be simplified by ignoring the term "+g", if we consider that $\mathcal{C}$ is in CoM-free system as the paper? (though the proof is correct.)
For any $y \in U$, I don't understand how to obtain $p(y) = \hat{p}(y)$ by $||y||_2^2=||Q y||_2^2$(1) even if I've seen your answer to the same question in openreview. (You said that $p(y) = \hat{p}(Qy) = \hat{p}(y)$(2), I doubt that maybe it's $p(y) = {p}(Qy)$, but overall I still don't know the connection between (1) and (2).

MinkaiXu commented 2 years ago

Yes, you can also view this as first moving the output of $\epsilon$ to zero mean, which can be regarded as part of the parameterization. Then the $C^{t-1}$ will naturally be CoM-free.
Should be the proof in Appendix A.5, for CoM-free Gaussian?
Yes, definitely. But indeed the network also holds translation-invariant, so I also mention "+g".
I remember $p(y) = \hat{p}(Qy)$ comes from the calculation of corresponding Gaussians.

Frankie123421 commented 2 years ago

Thanks, but as for 2, I don't seem to see any proof about showing that if $x \in U$, then $x + g = x$. Besides, maybe it holds for any CoM-free system, not just for CoM-free Gaussian?

MinkaiXu commented 2 years ago

I think maybe I should say, the CoM free $x$ and $x+g$ actually should be $Qx$ and $Q(x+g)$. I.e., no matter how $x$ is moved, we will always first move it back to zero-CoM. In this sense, we have $Qx = Q(x+g)$.

Frankie123421 commented 2 years ago

Got it. Thanks for taking time answering my questions. Really appreciate that!

MinkaiXu / GeoDiff

Question about the theory in the paper #14