Question about the output of vae.encode

cswry / SeeSR

[CVPR2024] SeeSR: Towards Semantics-Aware Real-World Image Super-Resolution

Apache License 2.0

337 stars 14 forks source link

Question about the output of vae.encode #7

Closed XsherryR closed 5 months ago

XsherryR commented 6 months ago

The input to vae.encode is 'pixel_values' [2, 3, 512, 512], however, the output 'latents' is [2, 4, 64, 64]. Why is the channel dimension different?

cswry commented 6 months ago

Hello, Stable Diffusion 2-1 downsamples the image to the latent space, just from 3x512x512 to 4x64x64.

For more details, you can refer to the paper.

XsherryR commented 6 months ago

Thanks for your reply! I want to know the code in the train_seesr.py whether using EMA to save the model parameters?

cswry commented 6 months ago

Hello, we do not use EMA strategy.

XsherryR commented 6 months ago

Hello, we do not use EMA strategy.

Would you be able to provide instructions on how to adjust the code in the train_seesr.py to incorporate EMA for parameter saving?

cswry commented 6 months ago

Hello, we do not use EMA strategy.

Would you be able to provide instructions on how to adjust the code in the train_seesr.py to incorporate EMA for parameter saving?

Our codes are based on the diffusers, if you want to add EMA function, you may refer this.