THUDM / CogVideo

text and image to video generation: CogVideoX (2024) and CogVideo (ICLR 2023)
Apache License 2.0
8.4k stars 803 forks source link

CogVideoX-5B may generate empty videos in some prompts #214

Closed whh258 closed 1 month ago

whh258 commented 2 months ago

System Info / 系統信息

CogVideoX-5B

Information / 问题信息

Reproduction / 复现过程

Thanks for the open source of the team. The video quality generated by CogVideoX-5B is really good!

However, when using the huggingface diffusers library and default parameters, we encountered an issue where the generated video was empty: for example, prompt="Yellow curtains swaging near a blue sofa" or "Blue ink drops into water and dispersions".

We don't know what caused this, but when we reduced the guidance scale parameter and the text condition, the generated video returned to normal. Can you provide an explanation or solution to avoid this when run a large number of prompts?

Expected behavior / 期待表现

Provide an explanation or solution to avoid empty videos when run a large number of prompts

zRzRzRzRzRzRzR commented 2 months ago

There is a big problem, your prompt is too short. Please carefully read our readme. We need to use long prompts as input, which requires you to use large language models like GPT-4 / GLM-4 to polish and input long prompts. Otherwise, this is a part that the model has not been trained on

yunkchen commented 2 months ago

There is a big problem, your prompt is too short. Please carefully read our readme. We need to use long prompts as input, which requires you to use large language models like GPT-4 / GLM-4 to polish and input long prompts. Otherwise, this is a part that the model has not been trained on

Our prompt:A person wears a white t-shirt and beige pants, holding a donut with pink icing and sprinkles. They bring the donut close to their mouth in several frames. The pink background contrasts with their white and beige clothing and the red-toned donut.

using demo code of huggingface model page: video = pipe( prompt=prompt, num_videos_per_prompt=1, num_inference_steps=50, num_frames=49, guidance_scale=6, generator=torch.Generator(device="cuda").manual_seed(42), ).frames[0]

tin2tin commented 2 months ago

I experienced the same problem with prompts like these:

bopan3 commented 1 month ago

I recommend increasing the num_inference_steps to 100, this works for my case.