THUDM / CogVideo

text and image to video generation: CogVideoX (2024) and CogVideo (ICLR 2023)
Apache License 2.0
9.17k stars 862 forks source link

size mismatch for model.diffusion_model.mixins.patch_embed.proj.weight: copying a param with shape torch.Size([3072, 256]) from checkpoint, the shape in current model is torch.Size([3072, 128]). #513

Open echoanran opened 2 hours ago

echoanran commented 2 hours ago

I followed https://github.com/THUDM/CogVideo/tree/main/sat/README.md to run bash inference.sh, and encountered the problem. The weights are downloaded from https://huggingface.co/THUDM/CogVideoX1.5-5B-SAT. What could I do to fix the problem? Thanks for you help!

截屏2024-11-18 16 25 20
nitinmukesh commented 2 hours ago

You will have to build diffusers from source including the PR https://github.com/THUDM/CogVideo/issues/510

echoanran commented 1 hour ago

I've build diffusers under branch cogvideox1.1-5b, and the current version is diffusers-0.32.0.dev0, but the problem still exists. Is there any other part to modify?

nitinmukesh commented 30 minutes ago

Compare pip list with this. I had the same issue but after installing it today it got resolved (Windows 11) https://github.com/THUDM/CogVideo/issues/509#issuecomment-2482129330