explainingai-code / StableDiffusion-PyTorch

This repo implements a Stable Diffusion model in PyTorch with all the essential components.
122 stars 25 forks source link

Bug when saving Latent information? #10

Closed jpmcarvalho closed 6 months ago

jpmcarvalho commented 6 months ago

Hello, When you run infer_vqvae.py you save the latent information (encoded information) but you do not clamp it (torch.clamp(encoded_output, -1., 1.)).

I also checked when you read it from dataset and when the variable of use_latents is equal to True you don't clamp it.

Maybe its a bug?

Thank you!

explainingai-code commented 6 months ago

Hello,

The clamping is essential when we are dealing with the pixel space(but not really needed when we are dealing with latent space). When we decode the generated latent sample, we want our generated image be in valid pixel range, hence the clamping, but in the latent space, we don't necessarily need that. Now while its not essential, the argument could be made that having the latents bounded(-1 to 1) using something like tanh function would be better for the diffusion model than the current unbounded case but from what I could understand looking at the code of official stable diffusion repo, they go with unbounded latent outputs - https://github.com/CompVis/stable-diffusion/blob/21f890f9da3cfbeaba8e2ac3c425ee9e998d5229/scripts/txt2img.py#L314 and do clamping only for the pixel space generated image.

jpmcarvalho commented 6 months ago

Thank you!