1.5版本的I2v模型如何支持768*1360 - Githubissues

THUDM / CogVideo

text and image to video generation: CogVideoX (2024) and CogVideo (ICLR 2023)

Apache License 2.0

9.37k stars 883 forks source link

1.5版本的I2v模型如何支持768*1360 #521

Closed gongleii closed 4 days ago

gongleii commented 1 week ago

/CogVideo/inference# python cli_demo.py --width 768 --height 1360会出错 1360768正常 Traceback (most recent call last): File "/CogVideo/inference/cli_demo.py", line 183, in generate_video( File "/CogVideo/inference/cli_demo.py", line 115, in generate_video video_generate = pipe( File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context return func(args, **kwargs) File "/CogVideo/diffusers/pipelines/cogvideo/pipeline_cogvideox_image2video.py", line 795, in call self._prepare_rotary_positional_embeddings(height, width, latents.size(1), device) File "/CogVideo/diffusers/pipelines/cogvideo/pipeline_cogvideox_image2video.py", line 564, in _prepare_rotary_positional_embeddings freqs_cos, freqs_sin = get_3d_rotary_pos_embed( File "/CogVideo/diffusers/models/embeddings.py", line 608, in get_3d_rotary_pos_embed cos = combine_time_height_width(t_cos, h_cos, w_cos) File "/CogVideo/diffusers/models/embeddings.py", line 591, in combine_time_height_width freqs = torch.cat( RuntimeError: Sizes of tensors must match except in dimension 3. Expected size 85 but got size 48 for tensor number 1 in the list.

gongleii commented 1 week ago

diffusers 使用的是：https://github.com/zRzRzRzRzRzRzR/diffusers/tree/cogvideox1.1-5b Try installing diffusers from here, as the PR has not been merged yet.

nitinmukesh commented 1 week ago

I think the model is trained on --width 1360 --height 768 dataset. You are trying reverse (portrait).

CheWang1110 commented 1 week ago

一样的问题，cogvideox-1.5-5b只能生成横向视频吗？

nitinmukesh commented 1 week ago

@gongleii @CheWang1110

You can request in this topic if you want support for vertical resolution. https://github.com/THUDM/CogVideo/issues/194

jinqiupeter commented 6 days ago

But --width 480 --height 720 works fine.

zRzRzRzRzRzRzR commented 6 days ago

已经支持了，请源代码安装最新的diffusers库和我在HF的最新一个commit

MikeAiJF commented 6 days ago

你好我运行app.py出现AttributeError: 'CogVideoXTransformer3DModel' object has no attribute 'ofs_embedding'这个错误应该怎么解决，希望得到帮助

LettleCreator commented 6 days ago

你好我运行app.py出现AttributeError: 'CogVideoXTransformer3DModel' object has no attribute 'ofs_embedding'这个错误应该怎么解决，希望得到帮助

用这个命令更新diffusers pip install git+https://github.com/huggingface/diffusers.git

MikeAiJF commented 6 days ago

你好我运行app.py出现AttributeError: 'CogVideoXTransformer3DModel' object has no attribute 'ofs_embedding' 这个错误应该怎么解决，希望得到帮助

使用此命令更新diffusers pip install git+ https://github.com/huggingface/diffusers.git 这个错误我解决了，谢谢你的回答。

这个错误什么意思

zRzRzRzRzRzRzR commented 6 days ago

你是I2V吗，那需呀输入图像的

MikeAiJF commented 6 days ago

你是I2V吗，那需呀输入图像的

t2v

MikeAiJF commented 6 days ago

zRzRzRzRzRzRzR commented 6 days ago

But did you use the I2V model? The T2V model doesn't have this ofs, are you sure you installed the latest diffusers source code?

MikeAiJF commented 6 days ago

但是你用的是 I2V 模型吗？T2V 模型没有这个，你确定安装了最新的扩散器源代码吗？

是的，用的是最新的1.5x-5b-I2V模型，是的安装的最新的0.32的

zRzRzRzRzRzRzR commented 6 days ago

The I2V model cannot perform T2V, the model for T2V is this https://huggingface.co/THUDM/CogVideoX1.5-5B

MikeAiJF commented 5 days ago

你好，我想再问一下文生图的模型也能用新发布的模型吗，还是只能用2b和5b

zRzRzRzRzRzRzR commented 5 days ago

这个模型是文生视频的，文生图是cogview

MikeAiJF commented 5 days ago

这个模型是文生视频的，文生图是cogview

抱歉，我想再问一下现在的gradio是不是不能使用最新的模型进行生成

zRzRzRzRzRzRzR commented 4 days ago

不能 gradio是老的，新的没有做，通常这个相应时间已经超过gradio了，可能会掉线