Closed Neethan54 closed 20 hours ago
What GPU are you using, it shouldn't be this slow. Also, the video should be 6 seconds long, can you calculate how long the average step took?
the GPU details are like below ![image](https://github.com/user-attachments/assets/1a92da51-ebdd-42c6-90e8-2d42413ae2d6
yes the video duration is 6 seconds long
This speed is clearly incorrect, however, for equipment like yours, I suggest operating according to this plan
This will significantly increase the speed
Hi @zRzRzRzRzRzRzR ,
I tried your suggestion, But now it is taking 14 minutes for 6 second Video, Below is the code im using
pipe_image = CogVideoXImageToVideoPipeline.from_pretrained(
"THUDM/CogVideoX-5b-I2V",
transformer=CogVideoXTransformer3DModel.from_pretrained(
"THUDM/CogVideoX-5b-I2V", subfolder="transformer", torch_dtype=torch.bfloat16
),
torch_dtype=torch.bfloat16
)
pipe_image.enable_sequential_cpu_offload()
seed = random.randint(0, 2**8 - 1)
prompt='A worker talking to his supervisor in an construction site. High quality, masterpiece, best quality, highres, ultra-detailed, fantastic.'
img_path='images/image_3.png'
from PIL import Image
pil_image = Image.open(img_path).resize(size=(720, 480))
image = load_image(img_path)
negative_prompt ="The video is not of a high quality, it has a low resolution. Strange motion trajectory. Flickering, Blurriness, Face restore.Deformation, anime, cartoon, graphic, text, painting, crayon, graphite, abstract, glitch, deformed, mutated, ugly, disfigured "
video_pt = pipe_image(
image=image,
prompt=prompt,
negative_prompt=negative_prompt,
num_inference_steps=50,
num_videos_per_prompt=1,
use_dynamic_cfg=True,
output_type="pt",
guidance_scale=7.0,
num_frames=49,
generator=torch.Generator(device="cuda").manual_seed(seed),
).frames
Please let me know, if im doing Wrong.
This code is correct, I did not see any errors
video_pt = pipe_image(
image=image,
prompt=prompt,
negative_prompt=negative_prompt,
num_inference_steps=50,
num_videos_per_prompt=1,
use_dynamic_cfg=True,
output_type="pt",
guidance scale 7.0
number of frames 49
generator=torch.Generator(device="cuda").manual_seed(seed),
).frames[0]
In this task, did it take 14 minutes? Our speed test only measures this step
This is clearly not the level of the A6000, even the T4 is faster than this
yes surprisingly, It is taking 14 minutes .
Hi @zRzRzRzRzRzRzR
How much time it is taking for you to generate 6 second video?
I use A100 for 180 seconds with the 5B model
can you please share the code , i want to check in A6000
i used 3090 on defulat cli_demo it is taking 12 minutes for 6 second Video used very few VRAM,Is this the correct speed? @zRzRzRzRzRzRzR
Same for me. On I2V it takes about 10 minutes on an RTX 4090. Only about 3GB of VRAM is used. I added the following code
pipe_image.enable_sequential_cpu_offload()
pipe_image.vae.enable_tiling()
It will take time, but since there is plenty of VRAM available, it seems that performance can be further improved by increasing the resolution and length. Please continue with the development. Also, would it be difficult to generate a video during inference?
If it takes a long time to generate the video, it will be a problem if you cannot predict the result until the video is completed. It would be good if you could see the intermediate results, even if it is at a low resolution and low frame rate.
For 4090, you can completely remove
pipe_image.enable_sequential_cpu_offload()
and just move pipe.to("cuda"), should work Currently, there is indeed no way to visualize the intermediate results
@zRzRzRzRzRzRzR
Im using the below torch with cuda version, is this correct?
pip install torch==2.4.0 torchvision==0.19.0 torchaudio==2.4.0 --index-url https://download.pytorch.org/whl/cu121
This should be fine, as the 2.4.0 version of PyTorch can also be compiled with CUDA 12.1.
@zRzRzRzRzRzRzR
Can you please share the code which you are running in A100
https://github.com/THUDM/CogVideo/blob/main/inference/cli_demo.py
follow this and remove the pipe_image.enable_sequential_cpu_offload()
and use pipe.to("cuda")
@zRzRzRzRzRzRzR I am using the above code and as you can see it is taking 8-9 minutes for 6 seconds.
hello!any progress here?same problem
I think the main reason is that, you should add pipe = pipe.cuda()
when copying the code from colab.
Hi @xijiu9 ,
Check this code, https://github.com/THUDM/CogVideo/issues/316#issue-2537904293.
I have added .cuda(), still it was taking so much time in windows OS.
Hii,
I am facing issue with delay in model loading and also the time taken to generate the video from Image. Currently it is taking 8minutes for 8 seconds video, I have 48GB VRAM , but still it is very slow.
Please let me know , if there is any way to solve this.
This is the code im using .
Thanks in Advance