camenduru / text-to-video-model

6 stars 2 forks source link

🐣 Please follow me for new updates https://twitter.com/camenduru
🔥 Please join our discord server https://discord.gg/k5BwmmvJJU
🥳 Please join my patreon community https://patreon.com/camenduru

Potat 1️⃣ (Prototype Model)

243292723-fa703668-a931-41e1-8bcf-19c72203980b

Open-Source 1024x576 Text To Video Model 🥳
Trained with https://lambdalabs.com ❤ 1xA100 (40GB)
2197 clips, 68388 tagged frames ( salesforce/blip2-opt-6.7b-coco )
train_steps: 10000
System RAM: ~8.5 GB VRAM: ~11 GB Model Size: ~4.1G

🦒 Colab

Colab Info
Open In Colab test

📦 Model

https://huggingface.co/camenduru/potat1
https://huggingface.co/vdo/potat1-5000/tree/main
https://huggingface.co/vdo/potat1-10000/tree/main
https://huggingface.co/vdo/potat1-10000-base-text-encoder/tree/main
https://huggingface.co/vdo/potat1-15000/tree/main
https://huggingface.co/vdo/potat1-20000/tree/main
https://huggingface.co/vdo/potat1-25000/tree/main
https://huggingface.co/vdo/potat1-30000/tree/main
https://huggingface.co/vdo/potat1-35000/tree/main
https://huggingface.co/vdo/potat1-40000/tree/main
https://huggingface.co/vdo/potat1-45000/tree/main
https://huggingface.co/vdo/potat1-50000/tree/main
https://huggingface.co/vdo/potat1-50000-base-text-encoder/tree/main = https://huggingface.co/camenduru/potat1

🧪 Examles

Prompt: Octopus under the ocean.

https://github.com/camenduru/text-to-video-model/assets/54370274/679fd523-a4e0-4c65-8deb-2a6829c8f26c

https://github.com/camenduru/text-to-video-model/assets/54370274/e59edfe0-41b0-46ff-ad32-b5cfd755361f

Screenshot 2023-08-23 130300

📋 Tutorial

https://github.com/camenduru/Text-To-Video-Finetuning-colab

📦 Base Model

https://huggingface.co/damo-vilab/modelscope-damo-text-to-video-synthesis
https://www.modelscope.cn/models/damo/text-to-video-synthesis

📦 Dataset & Config

https://huggingface.co/camenduru/potat1_dataset/tree/main
https://github.com/microsoft/XPretrain/tree/main/hd-vila-100m (HD-VILA-100M Dataset)
http://toflow.csail.mit.edu/ (Vimeo-90k Dataset)
https://github.com/m-bain/webvid
https://github.com/ExponentialML/Video-BLIP2-Preprocessor
https://github.com/Breakthrough/PySceneDetect

🍱 Finetuning

https://github.com/guoyww/animatediff
https://github.com/showlab/Tune-A-Video
https://github.com/ExponentialML/Text-To-Video-Finetuning
https://www.modelscope.cn/models/damo/text-to-video-synthesis

Thanks to damo-vilabExponentialMLkabachuha@DiffusersLib@LambdaAPI@cerspense@CiaraRowles1@p1atdev_art

Thanks to Orellius ❤ (important bug report)