Low GPU utilization on 4090, Windows 10

Korner83 commented 1 year ago

Thanks for the great project, I really like it.

Is it just me or is it normal to have only around 35-45% GPU utilization while it's genereting the reply? All my models are running from m. 2 ssd drives also memory should be fine with 64GB ram. Usually Vram usage stays below 20GB so I was wondering if there is a way to use the full potential in my card.

xiaoqian-shen commented 1 year ago

Yes, our model should be fittable for 24GB for vicuna13B and 12GB for vicuna7B. You can set the low_resource to false to fully utilize your gpu.

Korner83 commented 1 year ago

Thanks for the reply!

I have tried it, but looks like it gives me out of memory. Do you have any suggestion how to change max_split_size_mb? Where can I find this parameter? Shouldn't I change vit_precision: "fp16"?

This is the error I get:

Human: Take a look at this image and describe what you notice. ###Assistant:

Load BLIP2-LLM Checkpoint: c:/Users/polga/MiniGPT-4/model/pretrained_minigpt4.pth Traceback (most recent call last): File "c:\Users\polga\MiniGPT-4\demo.py", line 60, in model = model_cls.from_config(model_config).to('cuda:{}'.format(args.gpu_id)) File "C:\Users\polga.conda\envs\minigpt4\lib\site-packages\torch\nn\modules\module.py", line 989, in to return self._apply(convert) File "C:\Users\polga.conda\envs\minigpt4\lib\site-packages\torch\nn\modules\module.py", line 641, in _apply module._apply(fn) File "C:\Users\polga.conda\envs\minigpt4\lib\site-packages\torch\nn\modules\module.py", line 641, in _apply module._apply(fn) File "C:\Users\polga.conda\envs\minigpt4\lib\site-packages\torch\nn\modules\module.py", line 641, in _apply module._apply(fn) [Previous line repeated 3 more times] File "C:\Users\polga.conda\envs\minigpt4\lib\site-packages\torch\nn\modules\module.py", line 664, in _apply param_applied = fn(param) File "C:\Users\polga.conda\envs\minigpt4\lib\site-packages\torch\nn\modules\module.py", line 987, in convert return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking) torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 50.00 MiB (GPU 0; 23.99 GiB total capacity; 22.85 GiB already allocated; 0 bytes free; 23.03 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

silent780 commented 1 year ago

same issue with my 4090, and the chat result is really bad, don`t know why.

Korner83 commented 1 year ago

If I use the low_resource = True parameter it works in my case totally fine, just keep the beam search number on 1 otherwise it might give you wrong reply because it needs more vram. I just wanted to have higher GPU utilization, because currently it can use only 40% of my GPU which seams strange.

zxcvbn114514 commented 1 year ago

me too.mine is 3090 . REALLY SLOW.hope it,ll be solved soon

Vision-CAIR / MiniGPT-4

Low GPU utilization on 4090, Windows 10 #125

Human: Take a look at this image and describe what you notice. ###Assistant: