Closed whh258 closed 1 month ago
There is a big problem, your prompt is too short. Please carefully read our readme. We need to use long prompts as input, which requires you to use large language models like GPT-4 / GLM-4 to polish and input long prompts. Otherwise, this is a part that the model has not been trained on
There is a big problem, your prompt is too short. Please carefully read our readme. We need to use long prompts as input, which requires you to use large language models like GPT-4 / GLM-4 to polish and input long prompts. Otherwise, this is a part that the model has not been trained on
Our prompt:A person wears a white t-shirt and beige pants, holding a donut with pink icing and sprinkles. They bring the donut close to their mouth in several frames. The pink background contrasts with their white and beige clothing and the red-toned donut.
using demo code of huggingface model page: video = pipe( prompt=prompt, num_videos_per_prompt=1, num_inference_steps=50, num_frames=49, guidance_scale=6, generator=torch.Generator(device="cuda").manual_seed(42), ).frames[0]
I experienced the same problem with prompts like these:
I recommend increasing the num_inference_steps to 100, this works for my case.
System Info / 系統信息
CogVideoX-5B
Information / 问题信息
Reproduction / 复现过程
Thanks for the open source of the team. The video quality generated by CogVideoX-5B is really good!
However, when using the huggingface diffusers library and default parameters, we encountered an issue where the generated video was empty: for example, prompt="Yellow curtains swaging near a blue sofa" or "Blue ink drops into water and dispersions".
We don't know what caused this, but when we reduced the guidance scale parameter and the text condition, the generated video returned to normal. Can you provide an explanation or solution to avoid this when run a large number of prompts?
Expected behavior / 期待表现
Provide an explanation or solution to avoid empty videos when run a large number of prompts