Closed ivand321 closed 10 months ago
Thank you for your report. This is a problem with CPU memory. It might be that you use pinched or mmap memory which restricts how new memory can be allocated. With this, it is easy to run out of CPU memory.
This does not seem to be a problem with bitsandbytes, because bitsandbytes only works on the GPU. The error lies with another library and how it manages CPU memory.
I would advise you to open an issue on the respected githib repository that manages the CPU part of the library.
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Greetings, When I run web UI I got the following error:
Starting the web UI... Warning: --cai-chat is deprecated. Use --chat instead.
===================================BUG REPORT=================================== Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
CUDA SETUP: CUDA runtime path found: C:\ai\LLM\oobabooga-windows\installer_files\env\bin\cudart64_110.dll CUDA SETUP: Highest compute capability among GPUs detected: 8.6 CUDA SETUP: Detected CUDA version 117 CUDA SETUP: Loading binary C:\ai\LLM\oobabooga-windows\installer_files\env\lib\site-packages\bitsandbytes\libbitsandbytes_cuda117.dll... Loading anon8231489123_vicuna-13b-GPTQ-4bit-128g... Found the following quantized model: models\anon8231489123_vicuna-13b-GPTQ-4bit-128g\vicuna-13b-4bit-128g.safetensors Traceback (most recent call last): File "C:\ai\LLM\oobabooga-windows\text-generation-webui\server.py", line 346, in
shared.model, shared.tokenizer = load_model(shared.model_name)
File "C:\ai\LLM\oobabooga-windows\text-generation-webui\modules\models.py", line 103, in load_model
model = load_quantized(model_name)
File "C:\ai\LLM\oobabooga-windows\text-generation-webui\modules\GPTQ_loader.py", line 136, in load_quantized
model = load_quant(str(path_to_model), str(pt_path), shared.args.wbits, shared.args.groupsize, kernel_switch_threshold=threshold)
File "C:\ai\LLM\oobabooga-windows\text-generation-webui\modules\GPTQ_loader.py", line 32, in _load_quant
model = AutoModelForCausalLM.from_config(config)
File "C:\ai\LLM\oobabooga-windows\installer_files\env\lib\site-packages\transformers\models\auto\auto_factory.py", line 411, in from_config
return model_class._from_config(config, kwargs)
File "C:\ai\LLM\oobabooga-windows\installer_files\env\lib\site-packages\transformers\modeling_utils.py", line 1138, in _from_config
model = cls(config, kwargs)
File "C:\ai\LLM\oobabooga-windows\installer_files\env\lib\site-packages\transformers\models\llama\modeling_llama.py", line 614, in init
self.model = LlamaModel(config)
File "C:\ai\LLM\oobabooga-windows\installer_files\env\lib\site-packages\transformers\models\llama\modelingllama.py", line 445, in init
self.layers = nn.ModuleList([LlamaDecoderLayer(config) for in range(config.num_hidden_layers)])
File "C:\ai\LLM\oobabooga-windows\installer_files\env\lib\site-packages\transformers\models\llama\modelingllama.py", line 445, in
self.layers = nn.ModuleList([LlamaDecoderLayer(config) for in range(config.num_hidden_layers)])
File "C:\ai\LLM\oobabooga-windows\installer_files\env\lib\site-packages\transformers\models\llama\modeling_llama.py", line 256, in init
self.mlp = LlamaMLP(
File "C:\ai\LLM\oobabooga-windows\installer_files\env\lib\site-packages\transformers\models\llama\modeling_llama.py", line 152, in init
self.down_proj = nn.Linear(intermediate_size, hidden_size, bias=False)
File "C:\ai\LLM\oobabooga-windows\installer_files\env\lib\site-packages\torch\nn\modules\linear.py", line 96, in init
self.weight = Parameter(torch.empty((out_features, in_features), **factory_kwargs))
RuntimeError: [enforce fail at C:\cb\pytorch_1000000000000\work\c10\core\impl\alloc_cpu.cpp:72] data. DefaultCPUAllocator: not enough memory: you tried to allocate 141557760 bytes.
1)It says I do not have enough memory. It allocated 141557760 bytes (0.14 GB). I have 16 GB of RAM and an RTX 3060. Which is approximately 0.875% of RAM usage. Something dose not add up.
2) I used a few parameters in the WEB UI bat file like: --gpu-memory 3500MiB --cpu-memory 3000MiB( which constrains the CPU and GPU usage), --load-in-8bit, --auto-devices --cai-chat --wbits 4 --groupsize 128. None of them fixed the issue. BTW I found these in the: https://github.com/oobabooga/text-generation-webui/wiki/Low-VRAM-guide.
3) I selected option a)NVIDIA, however, based on the following line RuntimeError: [enforce fail at C:\cb\pytorch_1000000000000\work\c10\core\impl\alloc_cpu.cpp:72] data. DefaultCPUAllocator: not enough memory: you tried to allocate 141557760 bytes. I think it is running on CPU not GPU. I am 100% certain that I selected option a)NVIDIA. Which dose not add up.
I have been working on this the whole day. At this point I have no clue what to do. Keep in mind I am pretty new to all this. I have no idea if I am just stupid. Any help would be highly appreciated.