AILab-CVC / CV-VAE

[NeurIPS 2024] CV-VAE: A Compatible Video VAE for Latent Generative Video Models
https://ailab-cvc.github.io/cvvae/index.html
246 stars 8 forks source link

SD3 and SVD, latent 16 and 4 #14

Open abcdvzz opened 2 weeks ago

abcdvzz commented 2 weeks ago

Thank you for releasing the SD3 version. The reconstruction quality is much better with the latent channel set to 16. As you mentioned in other issues, the latent channel being 16 is crucial for reconstruction.

It makes me wonder if the SVD can be applied to this latent channel since the latent diffusion accepts the input as the latent channel being 4 instead of 16.

How did you solve this problem? Did you retrain the whole SVD, including the latent diffusion, after modifying the latent input channel from 4 to 16?