Open Tord-Zhang opened 5 months ago
The scale factor of CV-VAE is the same as that of SD2.1, both being 0.18215. The scale factor is used for the input and output of Unet, but it is not necessary to use the scale factor when only encoding and decoding images and videos.
When using the encoder, there is no need to apply a scaling factor? which is used in sd1.5 vae.
latent = vae3d.encode(video).latent_dist.sample()
In SD 1.5, it should belatent = vae3d.encode(video).latent_dist.sample().mul_(scaling_factor)