I2V model running in fp16 produces only noise

realisticdreamer114514 commented 2 months ago

diffusers 0.30.3 on cuda 12.4 & python 3.11 in conda venv of windows

In cli_demo.py, add these optimizations

pipe.enable_model_cpu_offload()
pipe.enable_sequential_cpu_offload()
pipe.vae.enable_slicing()
pipe.vae.enable_tiling()

and reduce the number of frames to 41 (for 5 seconds)

Run cli_demo.py python cli_demo.py --prompt "A female basketball player is standing in a basketball court, her body leaning forward towards the camera. She is wearing a vibrant blue basketball jersey with the number 3 prominently displayed. The person's head is tilted back, and her hands are clasped together in front of her legs. She is reaching for the viewpoint and waving at it closely. The court beneath her is a rich brown color with green wall in the background and a basketball hoop stands against the wall." --model_path "D:\CogVideoX-5b-I2V" --generate_type "i2v" --output_path ./output.mp4 --image_or_video_path "D:\test\process\2.png" --dtype float16
Video has only colored noise

A normal video is produced

kijai commented 2 months ago

The I2V model will generally do exactly that if anything but 49 frames is used.

realisticdreamer114514 commented 2 months ago

The I2V model will generally do exactly that if anything but 49 frames is used.

This is the case, thanks for pointing out.

THUDM / CogVideo