Cuda out of memory issue for simple inference

DAMO-NLP-SG / VideoLLaMA2

VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs

Apache License 2.0

871 stars 60 forks source link

I can successfully run the gradio server python videollama2/serve/gradio_web_server_adhoc.py and run inferences locally on my 12GB GPU. But can't run the simple inference script provides in Readme for the same model. It gives error: Traceback (most recent call last): File "inference.py", line 83, in <module> inference() File "inference.py", line 37, in inference tokenizer, model, processor, context_len = load_pretrained_model(model_path, None, model_name) File "/home/sigmind/VideoLLaMA2/VideoLLaMA2/videollama2/model/builder.py", line 170, in load_pretrained_model vision_tower.to(device=device, dtype=torch.float16) File "/home/sigmind/VideoLLaMA2/myenv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1173, in to return self._apply(convert) File "/home/sigmind/VideoLLaMA2/myenv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 779, in _apply module._apply(fn) File "/home/sigmind/VideoLLaMA2/myenv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 779, in _apply module._apply(fn) File "/home/sigmind/VideoLLaMA2/myenv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 779, in _apply module._apply(fn) [Previous line repeated 4 more times] File "/home/sigmind/VideoLLaMA2/myenv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 804, in _apply param_applied = fn(param) File "/home/sigmind/VideoLLaMA2/myenv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1159, in convert return t.to( torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 20.00 MiB. GPU If the memory was limited, it couldn't even run the adhoc gradio service and infer. What could be wrong here?

I can successfully run the gradio server python videollama2/serve/gradio_web_server_adhoc.py and run inferences locally on my 12GB GPU. But can't run the simple inference script provides in Readme for the same model. It gives error: Traceback (most recent call last): File "inference.py", line 83, in <module> inference() File "inference.py", line 37, in inference tokenizer, model, processor, context_len = load_pretrained_model(model_path, None, model_name) File "/home/sigmind/VideoLLaMA2/VideoLLaMA2/videollama2/model/builder.py", line 170, in load_pretrained_model vision_tower.to(device=device, dtype=torch.float16) File "/home/sigmind/VideoLLaMA2/myenv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1173, in to return self._apply(convert) File "/home/sigmind/VideoLLaMA2/myenv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 779, in _apply module._apply(fn) File "/home/sigmind/VideoLLaMA2/myenv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 779, in _apply module._apply(fn) File "/home/sigmind/VideoLLaMA2/myenv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 779, in _apply module._apply(fn) [Previous line repeated 4 more times] File "/home/sigmind/VideoLLaMA2/myenv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 804, in _apply param_applied = fn(param) File "/home/sigmind/VideoLLaMA2/myenv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1159, in convert return t.to( torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 20.00 MiB. GPU If the memory was limited, it couldn't even run the adhoc gradio service and infer. What could be wrong here?

Hello,I'm a phD student from ZJU, I also use videollama2 to do my own research,we create a WeChat group to discuss some issues of videollama2 and help each other,could you join us? Please contact me: WeChat number == LiangMeng19357260600, phone number == +86 19357260600,e-mail == liangmeng89@zju.edu.cn.

DAMO-NLP-SG / VideoLLaMA2

Cuda out of memory issue for simple inference #24