bitsandbytes-foundation / bitsandbytes

Accessible large language models via k-bit quantization for PyTorch.
https://huggingface.co/docs/bitsandbytes/main/en/index
MIT License
6.27k stars 627 forks source link

RuntimeError #294

Closed ivand321 closed 10 months ago

ivand321 commented 1 year ago

Greetings, When I run web UI I got the following error:

Starting the web UI... Warning: --cai-chat is deprecated. Use --chat instead.

===================================BUG REPORT=================================== Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues

CUDA SETUP: CUDA runtime path found: C:\ai\LLM\oobabooga-windows\installer_files\env\bin\cudart64_110.dll CUDA SETUP: Highest compute capability among GPUs detected: 8.6 CUDA SETUP: Detected CUDA version 117 CUDA SETUP: Loading binary C:\ai\LLM\oobabooga-windows\installer_files\env\lib\site-packages\bitsandbytes\libbitsandbytes_cuda117.dll... Loading anon8231489123_vicuna-13b-GPTQ-4bit-128g... Found the following quantized model: models\anon8231489123_vicuna-13b-GPTQ-4bit-128g\vicuna-13b-4bit-128g.safetensors Traceback (most recent call last): File "C:\ai\LLM\oobabooga-windows\text-generation-webui\server.py", line 346, in shared.model, shared.tokenizer = load_model(shared.model_name) File "C:\ai\LLM\oobabooga-windows\text-generation-webui\modules\models.py", line 103, in load_model model = load_quantized(model_name) File "C:\ai\LLM\oobabooga-windows\text-generation-webui\modules\GPTQ_loader.py", line 136, in load_quantized model = load_quant(str(path_to_model), str(pt_path), shared.args.wbits, shared.args.groupsize, kernel_switch_threshold=threshold) File "C:\ai\LLM\oobabooga-windows\text-generation-webui\modules\GPTQ_loader.py", line 32, in _load_quant model = AutoModelForCausalLM.from_config(config) File "C:\ai\LLM\oobabooga-windows\installer_files\env\lib\site-packages\transformers\models\auto\auto_factory.py", line 411, in from_config return model_class._from_config(config, kwargs) File "C:\ai\LLM\oobabooga-windows\installer_files\env\lib\site-packages\transformers\modeling_utils.py", line 1138, in _from_config model = cls(config, kwargs) File "C:\ai\LLM\oobabooga-windows\installer_files\env\lib\site-packages\transformers\models\llama\modeling_llama.py", line 614, in init self.model = LlamaModel(config) File "C:\ai\LLM\oobabooga-windows\installer_files\env\lib\site-packages\transformers\models\llama\modelingllama.py", line 445, in init self.layers = nn.ModuleList([LlamaDecoderLayer(config) for in range(config.num_hidden_layers)]) File "C:\ai\LLM\oobabooga-windows\installer_files\env\lib\site-packages\transformers\models\llama\modelingllama.py", line 445, in self.layers = nn.ModuleList([LlamaDecoderLayer(config) for in range(config.num_hidden_layers)]) File "C:\ai\LLM\oobabooga-windows\installer_files\env\lib\site-packages\transformers\models\llama\modeling_llama.py", line 256, in init self.mlp = LlamaMLP( File "C:\ai\LLM\oobabooga-windows\installer_files\env\lib\site-packages\transformers\models\llama\modeling_llama.py", line 152, in init self.down_proj = nn.Linear(intermediate_size, hidden_size, bias=False) File "C:\ai\LLM\oobabooga-windows\installer_files\env\lib\site-packages\torch\nn\modules\linear.py", line 96, in init self.weight = Parameter(torch.empty((out_features, in_features), **factory_kwargs)) RuntimeError: [enforce fail at C:\cb\pytorch_1000000000000\work\c10\core\impl\alloc_cpu.cpp:72] data. DefaultCPUAllocator: not enough memory: you tried to allocate 141557760 bytes.

1)It says I do not have enough memory. It allocated 141557760 bytes (0.14 GB). I have 16 GB of RAM and an RTX 3060. Which is approximately 0.875% of RAM usage. Something dose not add up.

2) I used a few parameters in the WEB UI bat file like: --gpu-memory 3500MiB --cpu-memory 3000MiB( which constrains the CPU and GPU usage), --load-in-8bit, --auto-devices --cai-chat --wbits 4 --groupsize 128. None of them fixed the issue. BTW I found these in the: https://github.com/oobabooga/text-generation-webui/wiki/Low-VRAM-guide.

3) I selected option a)NVIDIA, however, based on the following line RuntimeError: [enforce fail at C:\cb\pytorch_1000000000000\work\c10\core\impl\alloc_cpu.cpp:72] data. DefaultCPUAllocator: not enough memory: you tried to allocate 141557760 bytes. I think it is running on CPU not GPU. I am 100% certain that I selected option a)NVIDIA. Which dose not add up.

I have been working on this the whole day. At this point I have no clue what to do. Keep in mind I am pretty new to all this. I have no idea if I am just stupid. Any help would be highly appreciated.

TimDettmers commented 1 year ago

Thank you for your report. This is a problem with CPU memory. It might be that you use pinched or mmap memory which restricts how new memory can be allocated. With this, it is easy to run out of CPU memory.

This does not seem to be a problem with bitsandbytes, because bitsandbytes only works on the GPU. The error lies with another library and how it manages CPU memory.

I would advise you to open an issue on the respected githib repository that manages the CPU part of the library.

github-actions[bot] commented 10 months ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.