EnVision-Research / LucidDreamer

Official implementation of "LucidDreamer: Towards High-Fidelity Text-to-3D Generation via Interval Score Matching"
MIT License
749 stars 32 forks source link

Method Issue #15

Closed studying910 closed 10 months ago

studying910 commented 10 months ago

According to the multi-step DDIM sampling, it is mentioned in Section 3.2 that Eqn. (13) is derived from Eqn. (11).

However, it is quite confused since Eqn. (11) seems incorrect.

The DDIM sampling seems to be:

$\frac{\tilde{x}_s}{\sqrt{\overline{\alpha}_s}}=\frac{x_t}{\sqrt{\overline{\alpha}_t}}+(\gamma(s)-\gamma(t)) \epsilon(x_t; y, \phi)$.

Since $\overline{\alpha}_0=1$, it can derive Eqn. (13).

Also, the notation of the sampling latents $\tilde{x}_s \dots$ is missed.

AbnerVictor commented 10 months ago

Thank you for asking. We will check your problem ASAP

AbnerVictor commented 10 months ago

Thank you for asking, your induction is correct.

our original intention of Eqn.11 is to express the ddim denoising process in a similar way of Eqn.9, which is the ddim inversion. We make a mistake in the writing, we will fix that ASAP.

studying910 commented 10 months ago

Thx for your answer :)

AbnerVictor commented 10 months ago

Thx for your answer :)

Thank you for pointing out the problem. The revised Eqn.11 should be: $\tilde{x} _{t-\deltaT} = \sqrt{\bar \alpha {t-\delta_T}}(\hat{x}_0^t + \gamma(t-\deltaT)\epsilon \phi(x_t, t, y))$.

Where $\hat{x}_0^t = \frac{1}{\sqrt{\bar \alpha _t}}xt - \gamma (t) \epsilon \phi(x_t, t, y)$