Open DZY-irene opened 1 day ago
emm, num_frames needs to be changed to 161, export_to_video(video, output_path, fps=16). Also, have you tried whether five seconds is normal.
emm, num_frames needs to be changed to 161, export_to_video(video, output_path, fps=16). Also, have you tried whether five seconds is normal.
I want to make sure that the setting for 5 seconds video is num_frames=81 and export_to_video(video,output_path,fps=16)? And for 10 seconds is num_frames=161 and export_to_video(video,output_path,fps=16)? I found that CogVideoX1.5's frame rates are all 16fps, but the fps setting for export_to_video in huggface's demo is 8.
prompt: A focused individual sits at a sleek, modern desk in a dimly lit room, illuminated by the soft glow of a high-resolution computer screen. They wear a cozy, oversized sweater and glasses, reflecting the screen's light. The room is filled with the quiet hum of technology, with a minimalist setup including a mechanical keyboard and a wireless mouse. The person’s fingers dance swiftly across the keys, their face showing intense concentration. Behind them, a bookshelf filled with colorful books and a potted plant adds a touch of warmth to the tech-centric space. The scene captures the blend of human focus and digital interaction. Setting 81 frames and 16fps for 5-sec video output:
https://github.com/user-attachments/assets/2ca4bf5a-5cc7-4c62-be11-b077f7018ec0
Setting 161 frames and 16fps for 10-sec video output:
https://github.com/user-attachments/assets/59f59a19-81f3-4c3e-9007-ccada6652d84
By the way, when using the SAT version for 5-sec video sampling, everything goes well, and there is no black and white video. I suppose the diffuser version may still make things bad.
System Info / 系統信息
None
Information / 问题信息
Reproduction / 复现过程
Thank you for your wonderful work! I'm using vbench's gpt enhanced prompt for samples, and I've noticed that occasionally a couple of videos will have black or white output. Or for a long period of time the video is black and objects appear at the last second.
prompt: A focused individual sits at a sleek, modern desk in a dimly lit room, illuminated by the soft glow of a high-resolution computer screen. They wear a cozy, oversized sweater and glasses, reflecting the screen's light. The room is filled with the quiet hum of technology, with a minimalist setup including a mechanical keyboard and a wireless mouse. The person’s fingers dance swiftly across the keys, their face showing intense concentration. Behind them, a bookshelf filled with colorful books and a potted plant adds a touch of warmth to the tech-centric space. The scene captures the blend of human focus and digital interaction.
https://github.com/user-attachments/assets/9e1a915f-87fa-456c-90aa-b1d40bb83ace
prompt: A pristine white bathroom features a sleek, modern sink with a chrome faucet, set against a backdrop of glossy white tiles. The sink's surface is adorned with a neatly folded hand towel and a small potted plant, adding a touch of greenery. Adjacent to the sink, a contemporary toilet with a soft-close lid and a minimalist design stands out. The toilet's clean lines and the subtle sheen of its ceramic surface reflect the ambient light. The scene captures the essence of a serene, well-maintained bathroom, emphasizing cleanliness and modern aesthetics.
https://github.com/user-attachments/assets/ad79cbc5-7c06-4016-82f9-50cf0e65a709
prompt: A sleek black cat with piercing green eyes prowls gracefully through a dimly lit, mysterious alleyway, its fur glistening under the soft glow of a distant streetlamp. The cat pauses, ears perked, as it senses movement, its silhouette casting an elongated shadow on the cobblestone path. It then leaps effortlessly onto a nearby windowsill, where it sits, tail flicking, and gazes intently into the darkness. The scene transitions to a close-up of the cat's face, highlighting its sharp, alert features and the subtle twitch of its whiskers, capturing the essence of its enigmatic and nocturnal nature.
https://github.com/user-attachments/assets/f2f15612-eca7-4b72-976e-38b4222c493b
Here is my code:
The "test_human.txt" is test_human.txt
"test_human_longer.txt" is test_human_longer.txt
The prompt in the file is the one that is likely to have a black video.
Expected behavior / 期待表现
To figure out why this is happening.