Closed XsherryR closed 5 months ago
Hello, Stable Diffusion 2-1 downsamples the image to the latent space, just from 3x512x512 to 4x64x64.
For more details, you can refer to the paper.
Thanks for your reply! I want to know the code in the train_seesr.py whether using EMA to save the model parameters?
Hello, we do not use EMA strategy.
Hello, we do not use EMA strategy.
Would you be able to provide instructions on how to adjust the code in the train_seesr.py to incorporate EMA for parameter saving?
The input to vae.encode is 'pixel_values' [2, 3, 512, 512], however, the output 'latents' is [2, 4, 64, 64]. Why is the channel dimension different?