Open SIGMIND opened 4 months ago
The inference process requires a minimum of 17GB GPU
I can successfully run the gradio server
python videollama2/serve/gradio_web_server_adhoc.py
and run inferences locally on my 12GB GPU. But can't run the simple inference script provides in Readme for the same model. It gives error:Traceback (most recent call last): File "inference.py", line 83, in <module> inference() File "inference.py", line 37, in inference tokenizer, model, processor, context_len = load_pretrained_model(model_path, None, model_name) File "/home/sigmind/VideoLLaMA2/VideoLLaMA2/videollama2/model/builder.py", line 170, in load_pretrained_model vision_tower.to(device=device, dtype=torch.float16) File "/home/sigmind/VideoLLaMA2/myenv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1173, in to return self._apply(convert) File "/home/sigmind/VideoLLaMA2/myenv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 779, in _apply module._apply(fn) File "/home/sigmind/VideoLLaMA2/myenv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 779, in _apply module._apply(fn) File "/home/sigmind/VideoLLaMA2/myenv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 779, in _apply module._apply(fn) [Previous line repeated 4 more times] File "/home/sigmind/VideoLLaMA2/myenv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 804, in _apply param_applied = fn(param) File "/home/sigmind/VideoLLaMA2/myenv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1159, in convert return t.to( torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 20.00 MiB. GPU
If the memory was limited, it couldn't even run the adhoc gradio service and infer. What could be wrong here?
Hello,I'm a phD student from ZJU, I also use videollama2 to do my own research,we create a WeChat group to discuss some issues of videollama2 and help each other,could you join us? Please contact me: WeChat number == LiangMeng19357260600, phone number == +86 19357260600,e-mail == liangmeng89@zju.edu.cn.
I can successfully run the gradio server
python videollama2/serve/gradio_web_server_adhoc.py
and run inferences locally on my 12GB GPU. But can't run the simple inference script provides in Readme for the same model. It gives error:Traceback (most recent call last): File "inference.py", line 83, in <module> inference() File "inference.py", line 37, in inference tokenizer, model, processor, context_len = load_pretrained_model(model_path, None, model_name) File "/home/sigmind/VideoLLaMA2/VideoLLaMA2/videollama2/model/builder.py", line 170, in load_pretrained_model vision_tower.to(device=device, dtype=torch.float16) File "/home/sigmind/VideoLLaMA2/myenv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1173, in to return self._apply(convert) File "/home/sigmind/VideoLLaMA2/myenv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 779, in _apply module._apply(fn) File "/home/sigmind/VideoLLaMA2/myenv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 779, in _apply module._apply(fn) File "/home/sigmind/VideoLLaMA2/myenv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 779, in _apply module._apply(fn) [Previous line repeated 4 more times] File "/home/sigmind/VideoLLaMA2/myenv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 804, in _apply param_applied = fn(param) File "/home/sigmind/VideoLLaMA2/myenv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1159, in convert return t.to( torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 20.00 MiB. GPU
If the memory was limited, it couldn't even run the adhoc gradio service and infer. What could be wrong here?