How to get the text-to-video model

Stability-AI / generative-models

Generative Models by Stability AI

MIT License

24.77k stars 2.75k forks source link

How to get the text-to-video model #143

Open WuTao-CS opened 1 year ago

WuTao-CS commented 1 year ago

Exciting work! May I ask where the text-to-video model mentioned and used in the paper can be obtained? I only saw the waitlist to access a new upcoming web. Is there any open source plan?

crapthings commented 1 year ago

mkdir checkpoints cd checkpoints

wget https://huggingface.co/stabilityai/stable-video-diffusion-img2vid/resolve/main/svd.safetensors

WuTao-CS commented 1 year ago

mkdir checkpoints cd checkpoints wget https://huggingface.co/stabilityai/stable-video-diffusion-img2vid/resolve/main/svd.safetensors?download=true

Thank you, But this is the image to video model, I'm asking the text to video model.

Fearblade66 commented 1 year ago

text to video isnt out yet

crapthings commented 1 year ago

mkdir checkpoints cd checkpoints wget huggingface.co/stabilityai/stable-video-diffusion-img2vid/resolve/main/svd.safetensors?download=true

Thank you, But this is the image to video model, I'm asking the text to video model.

i think its easy to combie diffusers and image to video to do this

CyberTimon commented 1 year ago

Yes I'm I think it's easy to create such pipeline:

generate image using good finetuned sd 1.5
use this as reference image for the image to video model

maybe that's how they do it in the demo video.

vicitooo commented 1 year ago

So has anyone managed to run it? Even image to video?

dgparker commented 1 year ago

The paper they released doesn't indicate that there will be a text-to-video model. It seems the intention is to combine image-to-video models with traditional text-to-image models to generate the initial frame.

From the paper:

Finally, many recent works tackle the task of image-to-video synthesis, where the start frame is already given and the model has to generate the consecutive frames [30, 93, 108]. Importantly, as shown in our work (see Figure 1) when combined with off-the-shelf text-to-image models, image-to-video models can be used to obtain a full text-(to-image)-to-video pipeline.

gutzcha commented 1 year ago

So has anyone managed to run it? Even image to video?

Yup... it works. After you install the package and prepare the env following the instructions, You need to download the model as mentioned by @crapthings :

mkdir checkpoints
cd checkpoints
wget https://huggingface.co/stabilityai/stable-video-diffusion-img2vid/resolve/main/svd.safetensors

Then run the streamlit, change to whatever port you want streamlit run scripts/demo/sampling.py --server.port <your_port>

if you are running this on a remote machine, make sure to
tunnel

Then navigate your browser to: localhost:/

Example: run: streamlit run scripts/demo/sampling.py --server.port 8888 Navigate to: localhost:8888/

mayank64ce commented 9 months ago

@CyberTimon but how would you control what happens in the video ?

CyberTimon commented 9 months ago

Hey @mayank64ce, I'm sorry but I can't tell you this. I'm not that experienced with stable video etc..

Mercurise commented 8 months ago

The technical paper on my side is quite misleading regarding the text-to-video part. By default, we assume the codes are aligned with what is claimed but unfortunately, it's currently not the case.