exx8 / differential-diffusion

336 stars 17 forks source link

What is the injected term in the $z_{mix}$ formulation? #2

Closed csmendoza closed 4 months ago

csmendoza commented 9 months ago

From reviewing the paper, we wonder if the mask may have been reversed by mistake.

In the denoising loop, $z{mix}$ is combining the result of the previous denoising step $z{t+1}$, with $zt^ \prime$ which has just been obtained from $z{init}$ with the current noise level. It would seem that the "injected" part is $zt^ \prime$, because it is initialized from $z{init}$ with current noise level, thus skipping all the previous denoising steps and being "injected" later.

This seems consistent with the fact that the mask is getting bigger and bigger ($t$ is becoming smaller and smaller), and that the region that is injected becomes correspondingly smaller (including only the parts with smallest amount of change allowed).

However in the paper (Figure 3) it states the opposite "Top: $z_{𝑡+1}$ ⊙ 𝑚𝑎𝑠𝑘, which corresponds to the injected fragments at each time-step. bottom: 𝑧'_𝑡 ⊙ (1 − 𝑚𝑎𝑠𝑘) which corresponds to the fragments which are not injected at this time-step."

Can you confirm or dismiss whether there is an error in the paper? Or are we misunderstanding by claiming that the injected part is $z_t^ \prime$ ? If we are mistaken, do you know what we may be missing?

Thanks so much in advance for your help and clarification.

exx8 commented 9 months ago

I appreciate your interest in the paper. Should you have any additional inquiries or require further elucidation, please do not hesitate to reach out.

The injected component is: $z_t' \odot (1 - mask)$

Regarding the phrasing within caption 3, I can now see that it may be susceptible to misinterpretation. I will make the requisite revisions in the forthcoming version of the paper to ensure its precise and unambiguous communication. In the interim, I kindly request that you disregard this particular caption beneath Figure 3. Your feedback is greatly appreciated, and it helps a lot. Thank you.

csmendoza commented 9 months ago

I am very thankful for your prompt and conclusive response.

csmendoza commented 9 months ago

I suspect, given your clarifying response, that the mask definition in Algorithm 1 is then reverted with respect to $\mu$.

Because the way it is written right now, for small values of $t$ (less room for change) the injection mask (1-mask as per the paper) includes only values of $\mu$ that are very low. So the low values of $\mu$ should correspond to small amount of change (i.e. regions that are very close to $z_{init}$ and allowed to join the denoising only rather late in the denoising process). So in the paper, $\mu$ should be changed to ($1-\mu$), if low values of $\mu$ are to equal more change.

exx8 commented 9 months ago

From an algorithmic perspective, there is no inherent distinction between denoting full change with ones or zeros on the change map; both representations are equivalent in terms of mathematical operations. However, I agree that it is essential to adhere to the convention established earlier in the context of this algorithm, which dictates that full change should be represented as black on the paper.

To adhere to this convention and express full change as black, the appropriate course of action is to modify the arrow at line 8 as follows:

$mask = \mu_s \le \frac{t}{k}$ (The operator is ⧀ with equals, which I cannot represent in GitHub)

This adjustment ensures that the mask is correctly aligned with the established convention, where full change is visually represented as black on the paper.

Thanks for the feedback. Your insights and attention to detail are greatly appreciated. If you have any more suggestions or thoughts in the future, please don't hesitate to share them. Your input is invaluable

csmendoza commented 9 months ago

Thanks a lot. Happy to be of help.

exx8 commented 4 months ago

OK, I uploaded the new manuscript to the project site. https://differential-diffusion.github.io/paper.pdf Among other things, it addresses the issue you raised. It will be uploaded to arXiv this week. Thanks!