Closed Frankie123421 closed 2 years ago
The derivation you showed seems correct. But I think there are some minor miss-points:
I feel like your derivation is actually based on the above statements. Ping me if you still have any questions.
Thanks for your kind and prompt reply. I have some new questions based on your response.
$$ \mu_\theta(\mathcal{C}^t, \mathcal{G}, t)=\frac{1}{\sqrt{\alpha_t}}(\mathcal{C}^t-\frac{\beta_t}{\sqrt{1-\bar{\alpha}t}} \epsilon\theta(\mathcal{G}, \mathcal{C}^t, t)) $$
will be rotation and translation invariant no matter how $\epsilon{\theta}$ is designed since $R\mathcal{C}^t + g = \mathcal{C}^t$, so I doubt that 1 is not correct. And my original derivation is actually base on $\mu\theta(R\mathcal{C}^t + g, \mathcal{G}, t) = R\mu_\theta(\mathcal{C}^t, \mathcal{G}, t) + g$.
I am quite confused now )-: . Really looking forward to your answer. Thanks in advance.
Firstly, $R\mathcal{C}^t + g \neq \mathcal{C}^t$. By considering CoM free system, we can only have $R\mathcal{C}^t + g = R\mathcal{C}^t$ (since we always move all conformations to zero CoM). Second, similarly, $\mu\theta(R\mathcal{C}^t + g, \mathcal{G}, t) = R\mu\theta(\mathcal{C}^t, \mathcal{G}, t)$. And besides, I didn't fully get the paradox to point 1. As I say, $C^t$ is rotationally equivariant, and $\epsilon$ should also be equivariant for making $\mu$ equivariant.
Thanks for your reply. I think the main point is that my statement in 1 says $\mathcal{C}^t$ is rotationally invariant, but as what you said above, $\mathcal{C}^t$ is actually rotationally equivariant. I wonder why and doesn't it reveal that 1 is wrong? My understanding is that because $\mathcal{C}^t$ are always isotropic Gaussian, they are rotationally invariant rather than equivariant. I think I must misunderstand something. Sorry for confusing you, and also looking forward to your answer.
$C^t$ is data, not distribution, e.g., it is an $N \times 3$ tensor. With rotation, the tensor will also rotate. Invariance is the property of the Gaussian distribution, not the data (tensor) itself.
Thanks for your reply. I kind of got it now, but still have some questions. Before raising them, I would like to carefully ask that:
how to ensure that $\mathcal{C}^{t-1}$ is also in the CoM-free system? Is it induced from previous $\mathcal{C}^{t}$ (in the CoM-free system) and equivariant Markov transition kernel? Since as $\mu\theta(R\mathcal{C}^t + g, \mathcal{G}, t) = R\mu\theta(\mathcal{C}^t, \mathcal{G}, t)$, and if $R\mathcal{C}^{t-1} + g \neq R\mathcal{C}^{t-1}$, we can't get $R \mathcal{C}^{t-1} + g - \boldsymbol{\mu}{\theta}(R\mathcal{C}^t + g, \mathcal{G}, t) = R(\mathcal{C}^{t-1}-\boldsymbol{\mu}\theta)$ as the previous derivation, and thus the kernel is even not equivariant, except that $R\mathcal{C}^{t-1} + g = R\mathcal{C}^{t-1}$ is pre-known.
I am quite confused about the logic here.
For ensuring $C^t$ to be a CoM-free system, actually one can just always move CoMs of any $C$ to zero, making translation-invariant an intrinsic property of $C$.
Then yes, as I have explained before, we have:
Firstly, $R\mathcal{C}^t + g \neq \mathcal{C}^t$. By considering CoM free system, we can only have $R\mathcal{C}^t + g = R\mathcal{C}^t$ (since we always move all conformations to zero CoM). Second, similarly, $\mu\theta(R\mathcal{C}^t + g, \mathcal{G}, t) = R\mu\theta(\mathcal{C}^t, \mathcal{G}, t)$.
So the derivation is just simply $(R \mathcal{C}^{t-1} + g) - \boldsymbol{\mu}_{\theta}(R\mathcal{C}^t + g, \mathcal{G}, t) = R(\mathcal{C}^{t-1}-\boldsymbol{\mu}\theta)$.
Thanks for your response. Overall I see $\mathcal{C}^{t-1}$ is indeed in CoM-free system. Actually what I concerned about above is that, for example, considering $\mathcal{C}^T$ and $\mathcal{C}^{T-1}$, we've sample $\mathcal{C}^T$ from isotropic Gaussian and move it to CoM-free system, and could the next step $\mathcal{C}^{T-1}$ be naturally ensured to be in CoM-free system by the Markov transition kernel without any other operation? Now to my understanding (and with the help of your answer) it could not and this is actually achieved by sampling $\mathcal{C}^{T-1}$ from $p(\mathcal{C}^{T-1}|\mathcal{C}^T)$ and then move it to CoM-free system. (?)
How could I mathematically prove that CoM-free system is translationally-invariant?
Could the proof of $R \mathbf{x}^{l+1}, \mathbf{h}^{l+1}=\operatorname{GFN}\left(R \mathbf{x}^l, R \mathcal{C}+g, \mathbf{h}^l\right)$ be simplified by ignoring the term "+g", if we consider that $\mathcal{C}$ is in CoM-free system as the paper? (though the proof is correct.)
For any $y \in U$, I don't understand how to obtain $p(y) = \hat{p}(y)$ by $||y||_2^2=||Q y||_2^2$(1) even if I've seen your answer to the same question in openreview. (You said that $p(y) = \hat{p}(Qy) = \hat{p}(y)$(2), I doubt that maybe it's $p(y) = {p}(Qy)$, but overall I still don't know the connection between (1) and (2).
Thanks, but as for 2, I don't seem to see any proof about showing that if $x \in U$, then $x + g = x$. Besides, maybe it holds for any CoM-free system, not just for CoM-free Gaussian?
I think maybe I should say, the CoM free $x$ and $x+g$ actually should be $Qx$ and $Q(x+g)$. I.e., no matter how $x$ is moved, we will always first move it back to zero-CoM. In this sense, we have $Qx = Q(x+g)$.
Got it. Thanks for taking time answering my questions. Really appreciate that!
Hi Xu,
First of all thanks for your nice work. I've read your paper, and I have some questions on the proof of the equivariance of the transition kernel. In detail, suppose $\mathcal{C}^t$ is roto-translation invariant, and (thus) $\mu_{\theta}(\mathcal{C}^t, \mathcal{G}, t)$ is roto-translation equivariant with desgined GNN, we need to prove that $p(\mathcal{C}^{t-1} | \mathcal{C}^{t}, \mathcal{G}, t)$ is equivariant. I wonder if it is due to the following derivation:
$$\begin{aligned} p(R \mathcal{C}^{t-1} + g | R \mathcal{C}^{t} + g, \mathcal{G}, t) &= \mathcal{N}(R\mathcal{C}^{t-1} + g; \boldsymbol{\mu}_{\theta}(R\mathcal{C}^t + g, \mathcal{G}, t), \sigmat^2 \mathbf{I}) \ &= \frac{1}{(2 \pi)^{\frac{p}{2}}|\boldsymbol{\Sigma}|^{-\frac{1}{2}}} e^{-\frac{1}{2}(R(\mathcal{C}^{t-1}-\boldsymbol{\mu}\theta))^T \boldsymbol{\Sigma}^{-1}(R(\mathcal{C}^{t-1}-\boldsymbol{\mu}\theta))} \ &= \frac{1}{(2 \pi)^{\frac{p}{2}}|\boldsymbol{\Sigma}|^{-\frac{1}{2}}} e^{-\frac{1}{2}(\mathcal{C}^{t-1}-\boldsymbol{\mu}\theta)^T \boldsymbol{\Sigma}^{-1}(\mathcal{C}^{t-1}-\boldsymbol{\mu}_\theta)} \end{aligned}$$
where $\boldsymbol{\Sigma} = \sigma_t^2 \mathbf{I}$. I am not sure if it's correct, hope to receive your clarification. Thanks.