Open qiuchen001 opened 4 months ago
I have encountered the same problem. It seems that this problem is since the transformers version.
Did you solve the error? I encountered the same error while debugging the Video-LLaVA code.
you can copy the following code into corresponding transformer libarary to solve the problem
def _expand_mask(mask: torch.Tensor, dtype: torch.dtype, tgt_len: Optional[int] = None): bsz, src_len = mask.size() tgt_len = tgt_len if tgt_len is not None else src_len expanded_mask = mask[:, None, None, :].expand(bsz, 1, tgt_len, src_len).to(dtype) inverted_mask = 1.0 - expanded_mask return inverted_mask.masked_fill(inverted_mask.to(torch.bool), torch.finfo(dtype).min)
Do not run pip install -U transformers, pip install transformers==4.31.0 is right.
scenes: CLI Inference
command: CUDA_VISIBLE_DEVICES=0 python3 -m videollava.serve.cli --model-path "/root/Video-LLaVA-7B" --file "/root/videos/8132-207209040_small.mp4" --load-4bit
issues: [2024-07-21 04:02:21,967] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) Traceback (most recent call last): File "/root/.conda/envs/video-llava/lib/python3.10/runpy.py", line 187, in _run_module_as_main mod_name, mod_spec, code = _get_module_details(mod_name, _Error) File "/root/.conda/envs/video-llava/lib/python3.10/runpy.py", line 110, in _get_module_details import(pkg_name) File "/root/Video-LLaVA/videollava/init.py", line 1, in
from .model import LlavaLlamaForCausalLM
File "/root/Video-LLaVA/videollava/model/init.py", line 1, in
from .language_model.llava_llama import LlavaLlamaForCausalLM, LlavaConfig
File "/root/Video-LLaVA/videollava/model/language_model/llava_llama.py", line 26, in
from ..llava_arch import LlavaMetaModel, LlavaMetaForCausalLM
File "/root/Video-LLaVA/videollava/model/llava_arch.py", line 21, in
from .multimodal_encoder.builder import build_image_tower, build_video_tower
File "/root/Video-LLaVA/videollava/model/multimodal_encoder/builder.py", line 3, in
from .languagebind import LanguageBindImageTower, LanguageBindVideoTower
File "/root/Video-LLaVA/videollava/model/multimodal_encoder/languagebind/init.py", line 6, in
from .image.modeling_image import LanguageBindImage
File "/root/Video-LLaVA/videollava/model/multimodal_encoder/languagebind/image/modeling_image.py", line 11, in
from transformers.models.clip.modeling_clip import CLIPMLP, CLIPAttention, CLIPTextEmbeddings, CLIPVisionEmbeddings, \
ImportError: cannot import name '_expand_mask' from 'transformers.models.clip.modeling_clip' (/root/.conda/envs/video-llava/lib/python3.10/site-packages/transformers/models/clip/modeling_clip.py)
I've already install required packages:
AND
pip install -U transformers