Open astral705 opened 1 week ago
Thank you for your interest in our paper! It is a good question. In our paper, we assume that $\lVert\Sigma_p\rVert=1$, which is why we need the normalization factor $\frac{1}{\lVert\Sigma_d\rVert}$. However, the determinate of $\Sigma_p$, which is the scale of the noise added, is chosen by the user. We choose to eliminate the normalization factor for simplification. Our paper provides 1) optimal Gaussian distribution, when $\Sigma_p$ is fixed; 2) optimal $\Sigma_p$, when $\lVert\Sigma_p\rVert$ is fixed. I hope this information could help you!
Thank you for your detailed explanation! It really helps clarify the reasoning. I have one more question: Does the scale of the noise, determined by the determinant of $$\Sigma_p$$, ultimately affect the generated results? I'm curious whether adjusting this scale has any noticeable impact on the performance or quality of the generated outputs.
This is an interesting question. I have not explored it theoretically, but I believe the noise scale will not influence performance as long as the score function is perfectly learned and the SDE/ODE solver is absolutely accurate. However, in practice, it does affect performance because we rely on numerical solvers to complete the reverse diffusion process. For instance, if the noise scale is too large, the injected noise per step becomes very significant compared to the pure data, making it difficult for the model to denoise such noisy data. Moreover, we need to use very small steps to ensure the learned score function is accurate and sufficiently "continuous," while also minimizing numerical errors during the denoising process. There may be mathematical equations to quantify the influence of the noise scale, given fixed step sizes and other numerical settings. This paper could be relevant and may provide some insights into this problem: Image generation with shortest path diffusion, https://proceedings.mlr.press/v202/das23a/das23a.pdf. I hope my answer may help and I am happy to discuss more on it and other questions. It is indeed an interesting question. Thank you for proposing the questions.
Thank you for sharing your work and providing both the paper and code. While reviewing the implementation, I noticed a potential mismatch between the theoretical formulation and the code of optimal gaussian diffusion (OGD).
In proposition 1, section 4.1 of your paper, you claim that
$\Sigma_p^{*}(i,j) = \frac{1}{\lVert \Sigma_d \rVert} \Sigma_d(i, j)$
However, in the implementation within file joint_diffusion.py, specefically
get_loss_opd
, it seems that the variance of $\Sigma_p$ is directly set to $\Sigma_d(i,j)$ without the normalization factor $\frac{1}{\lVert \Sigma_d \rVert}$:Could you kindly clarify the rationale for this discrepancy? Is there an intentional simplification or assumption that was made in the code?