Open BlueAmulet opened 2 years ago
I am also confused about this and met the same problem... I can not sample if I set the ddim_num_steps to 1000.
@rromb Could you please help us clarify this line? The DDIM sampling steps can not be 1000 due to this. Similar issue in #44
I faced the same issue. Has anyone found a solution / explanation to this?
Seems like this line's only purpose is to prevent timestep=0 to be used in the ddim scheduler. This seems to be a problem because then, following line https://github.com/CompVis/latent-diffusion/blob/a506df5756472e2ebaf9078affdde2c4f1502cd4/ldm/modules/diffusionmodules/util.py#L66, alpha_t and alpha_t-1 are identical for t=1.
Using equation (12) in https://arxiv.org/pdf/2010.02502.pdf (DDIM paper) results in x0 = x1 for this case. Since x1 is normalized differently than x0 this might be the problem that is referred to as comment "add one to get the final alpha values right (the ones from first scale to data during sampling)" in https://github.com/CompVis/latent-diffusion/blob/a506df5756472e2ebaf9078affdde2c4f1502cd4/ldm/modules/diffusionmodules/util.py#L56C1-L56C1
My suggested fix would be to remove https://github.com/CompVis/latent-diffusion/blob/main/ldm/modules/diffusionmodules/util.py#L57
and to change https://github.com/CompVis/latent-diffusion/blob/a506df5756472e2ebaf9078affdde2c4f1502cd4/ldm/modules/diffusionmodules/util.py#L66 into
alphas_prev = np.asarray([1.] + alphacums[ddim_timesteps[:-1]].tolist())
as this follows the convention introduced in https://arxiv.org/pdf/2010.02502.pdf (DDIM paper), equation (12)
Any thoughts on this?
just checked: there are no visible qualitative differences in the image synthesis between the original implementation and my suggested fix but now you can sample ddim with 1000 steps
Hello @malteprinzler
Did you manage to train a model with custom data? I have a few questions which I cannot seem to find answers to in the Readme..
@TorAP yes i did, what questions do you have?
Thank you! @malteprinzler
When I train an LDM using the example given in the readme, what type of LDM am I training? I have followed the LSUN example and used the same autoencoder only changing the dataset used for training.
I want to train an image-image LDM, with the final goal of being able to supply a image as input from domain A (non-defects) and get a image from domain B (defects). Ideally, including some condition like text-prompts. Is this possible?
General understand question: I see a lot of different modules within this repo and I have a hard time figuring out which modules are the most essential and which should be left as boiler plate. Not sure if this deserves an answer, but do you have a take on this?
Please tell me if the questions above does not make sense . I apologies my ignorance within this field, but this is not within my 'expertise'.
Best regards
@TorAP sorry, just noticed that you're talking about LDMs, I have only trained ControlNets so far, i.e. just LDM finetuning so I havent worked much with his repository here.
I see @malteprinzler, all good. The next step for me will be using a ControlNet, once I figure out the training steps of LDM.s :-)
Did you have any success with training your own LDM @BlueAmulet, and do you have any input on my questions? :)
I faced the same issue. Has anyone found a solution / explanation to this?
Seems like this line's only purpose is to prevent timestep=0 to be used in the ddim scheduler. This seems to be a problem because then, following line
, alpha_t and alpha_t-1 are identical for t=1. Using equation (12) in https://arxiv.org/pdf/2010.02502.pdf (DDIM paper) results in x0 = x1 for this case. Since x1 is normalized differently than x0 this might be the problem that is referred to as comment "add one to get the final alpha values right (the ones from first scale to data during sampling)" in https://github.com/CompVis/latent-diffusion/blob/a506df5756472e2ebaf9078affdde2c4f1502cd4/ldm/modules/diffusionmodules/util.py#L56C1-L56C1
My suggested fix would be to remove https://github.com/CompVis/latent-diffusion/blob/main/ldm/modules/diffusionmodules/util.py#L57 and to change
into
alphas_prev = np.asarray([1.] + alphacums[ddim_timesteps[:-1]].tolist())
as this follows the convention introduced in https://arxiv.org/pdf/2010.02502.pdf (DDIM paper), equation (12) Any thoughts on this?
@malteprinzler Hello, I agree with you about "Using equation (12) in https://arxiv.org/pdf/2010.02502.pdf (DDIM paper) results in x0 = x1 for this case. ". But I wonder what is the meaning of your statement "Since x1 is normalized differently than x0 ". Could you please explain it? what the "normalize" stands for in the context?
@haohang96 Maybe the term "normalized" was a bit misleading.
Referring to equation 4 from the ddim paper: https://arxiv.org/pdf/2010.02502.pdf
we get x_1 = sqrt(alpha_1)x_0 + sqrt(1-alpha_1)epsilon
which obviously != x_0.
However, note that as outlined above, the implementation treats the predictions of x_1 to be equal to the prediction of x_0.
Please read my comment just as "You shouldn't use the prediction for x_1 as the prediction for x_0"
Thanks for your reply :)
Do you mean that x1 = sqrt(alpha1) x0 + sqrt(1 - alpha1) eps and treat x0 = sqrt(alpha0) x0 + sqrt(1-alpha0) eps (although x0 can be sampled directly from real data, without adding noise).
if set alpha1 == alpha0, then x0=x1, which is uncorrect?
my point was more that currently, the predictions for x1 are effectively used as the final output of the denoising process, which is not correct
my point was more that currently, the predictions for x1 are effectively used as the final output of the denoising process, which is not correct
Hello @malteprinzler , could I elucidate the points you raised with the following derivation?
Referring to Eq.12 in DDIM:
When offset=1 is not utilized, leading to $\boldsymbol {\alpha_0=\alpha_1}$. Considering $\boldsymbol{\sigma_1=0}$, the equation simplifies as follows:
In essence, this implies that the predictions for x1 effectively serve as the final output of the denoising process, contradicting your earlier derivation: $x_0 \neq x_1$.
I think you are referring to equation 12 instead of 21
I think you are exactly deriving what I tried to express, namely, that with the current implementation, the final prediction, hence x_0 equals the prediction for x_1. The problem that I see is your assumption that alpha_0 and alpha_1 are identical. From how I understand the paper, all alpha_t, t=1...T are copied from the original diffusion model such that ddim can be used without retraining of diffusion models. Note that additionally, alpha_0 = 1 is defined in the ddim paper just below equation 12, however, in the current implementation, alpha_0 is set to a value different from 1 (see my original comment above). If you actually set alpha_0 to 1 in equation 12, you will see that the prediction of x_0 will be identical to the term that is marked as "predicted x0". However, the current implementation effectively sets alpha_0 to alpha_1!=1, which as you correctly derived leads to the prediction of x0 = x1. So practically, the last denoising step has no effect! Note however, that x1 will not follow the distribution of GT images! It will follow the distribution (at least in approximation) that the x_1 followed during training the diffusion model, hence a slightly noised version of the ground truth distribution. In practice, this noise level is quite small so you probably wont notice a big visual difference, it is just mathematically not quite correct.
Hope that helps.
Thanks very much for your response, I got it!
What is the significance of this line of code? https://github.com/CompVis/latent-diffusion/blob/main/ldm/modules/diffusionmodules/util.py#L57 It has a slight issue that if you want to run the model for the full 1000 timesteps, an array of [0, 999] is made, then moved to [1, 1000], and then crashes here with "IndexError: index 1000 is out of bounds for dimension 0 with size 1000" https://github.com/CompVis/latent-diffusion/blob/main/ldm/modules/diffusionmodules/util.py#L65