AILab-CVC / FreeNoise

[ICLR 2024] Code for FreeNoise based on VideoCrafter
http://haonanqiu.com/projects/FreeNoise.html
Apache License 2.0
366 stars 24 forks source link

RuntimeError: Mask shape should match input. mask: [77, 77] input: [77, 16, 1, 1] #17

Closed changsn closed 1 month ago

changsn commented 1 month ago

Thank you for your great work! I suffer an error when I reproduced your code. Could you help me and give me some ideas? Thank your for your helps in advance!

@CoLVDM Inference: 2024-08-15-03-25-40 Global seed set to 123 AE working on z of shape (1, 4, 64, 64) = 16384 dimensions.

model checkpoint loaded. [rank:0] 2/2 samples loaded. [rank:0] batch-1 (1)x3 ... /usr/local/conda/envs/freenoise/lib/python3.8/site-packages/torch/nn/modules/activation.py:1144: UserWarning: Converting mask without torch.bool dtype to bool; this will negatively affect performance. Prefer to use a boolean mask directly. (Triggered internally at ../aten/src/ATen/native/transformers/attention.cpp:150.) return torch._native_multi_head_attention( Traceback (most recent call last): File "scripts/evaluation/inference_freenoise.py", line 147, in run_inference(args, gpu_num, rank) File "scripts/evaluation/inference_freenoise.py", line 117, in run_inference text_emb = model.get_learned_conditioning(prompts) File "/mnt/workspace/video_generation/FreeNoise/scripts/evaluation/../../lvdm/models/ddpm3d.py", line 448, in get_learned_conditioning c = self.cond_stage_model.encode(c) File "/mnt/workspace/video_generation/FreeNoise/scripts/evaluation/../../lvdm/modules/encoders/condition.py", line 235, in encode return self(text) File "/usr/local/conda/envs/freenoise/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, kwargs) File "/mnt/workspace/video_generation/FreeNoise/scripts/evaluation/../../lvdm/modules/encoders/condition.py", line 212, in forward z = self.encode_with_transformer(tokens.to(self.device)) File "/mnt/workspace/video_generation/FreeNoise/scripts/evaluation/../../lvdm/modules/encoders/condition.py", line 219, in encode_with_transformer x = self.text_transformer_forward(x, attn_mask=self.model.attn_mask) File "/mnt/workspace/video_generation/FreeNoise/scripts/evaluation/../../lvdm/modules/encoders/condition.py", line 231, in text_transformer_forward x = r(x, attn_mask=attn_mask) File "/usr/local/conda/envs/freenoise/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, *kwargs) File "/usr/local/conda/envs/freenoise/lib/python3.8/site-packages/open_clip/transformer.py", line 263, in forward x = q_x + self.ls_1(self.attention(q_x=self.ln_1(q_x), k_x=k_x, v_x=v_x, attn_mask=attn_mask)) File "/usr/local/conda/envs/freenoise/lib/python3.8/site-packages/open_clip/transformer.py", line 250, in attention return self.attn( File "/usr/local/conda/envs/freenoise/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(args, kwargs) File "/usr/local/conda/envs/freenoise/lib/python3.8/site-packages/torch/nn/modules/activation.py", line 1144, in forward return torch._native_multi_head_attention( RuntimeError: Mask shape should match input. mask: [77, 77] input: [77, 16, 1, 1]

arthur-qiu commented 1 month ago

Hi did you change some parameters? Can you clone the code and run it again to see whether the problem happens?

changsn commented 1 month ago

Hi, you code does not support latest open_clip so that this error was reported. I addressed this issue by installing open_clip_torch-2.22.0. pip install open_clip_torch==2.22.0