DeSinc / SallyBot

AI Chatbot coded in Discord.net C#
MIT License
299 stars 51 forks source link

CUDA out of memory? #32

Closed Teemu671 closed 1 year ago

Teemu671 commented 1 year ago
output = shared.model.generate(**generate_params)[0]

File "D:\bot\SallyBot-main\one-click-installers-main\installer_files\env\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context return func(*args, kwargs) File "D:\bot\SallyBot-main\one-click-installers-main\installer_files\env\lib\site-packages\transformers\generation\utils.py", line 1518, in generate return self.greedy_search( File "D:\bot\SallyBot-main\one-click-installers-main\installer_files\env\lib\site-packages\transformers\generation\utils.py", line 2335, in greedy_search outputs = self( File "D:\bot\SallyBot-main\one-click-installers-main\installer_files\env\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl return forward_call(*args, *kwargs) File "D:\bot\SallyBot-main\one-click-installers-main\installer_files\env\lib\site-packages\transformers\models\llama\modeling_llama.py", line 687, in forward outputs = self.model( File "D:\bot\SallyBot-main\one-click-installers-main\installer_files\env\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl return forward_call(args, kwargs) File "D:\bot\SallyBot-main\one-click-installers-main\installer_files\env\lib\site-packages\transformers\models\llama\modeling_llama.py", line 577, in forward layer_outputs = decoder_layer( File "D:\bot\SallyBot-main\one-click-installers-main\installer_files\env\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl return forward_call(*args, kwargs) File "D:\bot\SallyBot-main\one-click-installers-main\installer_files\env\lib\site-packages\transformers\models\llama\modeling_llama.py", line 292, in forward hidden_states, self_attn_weights, present_key_value = self.self_attn( File "D:\bot\SallyBot-main\one-click-installers-main\installer_files\env\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl return forward_call(*args, *kwargs) File "D:\bot\SallyBot-main\one-click-installers-main\text-generation-webui\modules\llama_attn_hijack.py", line 39, in xformers_forward query_states = self.q_proj(hidden_states).view(bsz, q_len, self.num_heads, self.head_dim).transpose(1, 2) File "D:\bot\SallyBot-main\one-click-installers-main\installer_files\env\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl return forward_call(args, kwargs) File "D:\bot\SallyBot-main\one-click-installers-main\text-generation-webui\repositories\GPTQ-for-LLaMa\quant.py", line 360, in forward weight = torch.bitwise_right_shift(torch.unsqueeze(self.qweight, 1).expand(-1, 8, -1), self.wf1).to(torch.int8) torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 100.00 MiB (GPU 0; 8.00 GiB total capacity; 7.05 GiB already allocated; 0 bytes free; 7.29 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF Output generated in 0.00 seconds (0.00 tokens/s, 0 tokens, context 181, seed 1930186351)

DeSinc commented 1 year ago

Late reply because Gmail sends github emails to spam. lol

It looks like you'll need to pick a smaller model to run on this GPU. For a 3GB gpu, you'll probably only run any of the 3B parameter models. A 6GB GPU is required for any of the 7B models, maybe even the 6.7 Pygmalion model too. And for a 13B model, you'll probably need about 10GB of vram.