Stability-AI / generative-models

Generative Models by Stability AI
MIT License
23.14k stars 2.56k forks source link

The vae encoder of the first_stage_model #345

Open forgetable233 opened 2 months ago

forgetable233 commented 2 months ago

I'm using the sv3d_p model. I noticed that the vae encoder of the first_stage_model is not provided in the ckpt. I wonder what's the vae encoder of the first_stage_model while training?

JiuTongBro commented 2 months ago

Same question.

pengc02 commented 1 month ago

Hi guys, i'm also focus on this. It seems that sv3d use the same encoder and decoder as svd, while svd's encoder is released on huggingface. You can refer to: https://huggingface.co/stabilityai/stable-video-diffusion-img2vid-xt/tree/main/vae for the ckpt, https://github.com/huggingface/diffusers/blob/v0.24.0-release/src/diffusers/models/autoencoder_kl_temporal_decoder.py for the model code, and https://github.com/huggingface/diffusers/blob/v0.24.0-release/src/diffusers/pipelines/stable_video_diffusion/pipeline_stable_video_diffusion.py for how to use.