THUDM / CogVideo

text and image to video generation: CogVideoX (2024) and CogVideo (ICLR 2023)
Apache License 2.0
8.47k stars 811 forks source link

do you guys have a plan about 10B model #247

Closed codeplay0000 closed 1 month ago

codeplay0000 commented 2 months ago

thanks for your awesome work, do you have a plan to train a 10B or even larger model?

zRzRzRzRzRzRzR commented 2 months ago

Not at the moment. Currently, we are all working on video generation models that are under 10B in size.

codeplay0000 commented 1 month ago

Not at the moment. Currently, we are all working on video generation models that are under 10B in size.

larger than 5B?

zRzRzRzRzRzRzR commented 1 month ago

Still the 5B model, currently I am rushing to complete the preparation work for the open-source of the I2V model

codeplay0000 commented 1 month ago

Still the 5B model, currently I am rushing to complete the preparation work for the open-source of the I2V model

really excited and looking forward the i2v model release. btw, what kind of solution you choose to implement the i2v model, 've tried the mask inpainting way (like opensora's i2v ), look like it works:。 https://github.com/user-attachments/assets/d0af3128-6672-4fa5-92a6-42c8a67a25c0 https://github.com/user-attachments/assets/24a36081-3010-418f-abc3-ddc3e75126c9

zRzRzRzRzRzRzR commented 1 month ago

You can check solutions likes SVD (not the same), where the input channels are doubled for processing image embeddings