PromtEngineer / localGPT

Chat with your documents on your local device using GPT models. No data leaves your device and 100% private.
Apache License 2.0
20.02k stars 2.23k forks source link

Getting not "not enough space in the buffer" #434

Open amit2103 opened 1 year ago

amit2103 commented 1 year ago

Thanks for creating the awesome project. So I was trying to play around with the project. I have a couple of PDF's that i wanted to use (mostly around 300 pages).

I have a laptop 3080 with 16GB Vram and I was using the , MODEL_ID = "TheBloke/Llama-2-7B-Chat-GGML with MODEL_BASENAME = "llama-2-7b-chat.ggmlv3.q2_K.bin".

I repeatedly get the error


GGML_ASSERT: C:\Users\amitp\AppData\Local\Temp\pip-install-tfpb1y6w\llama-cpp-python_c92100fd9be945f4b6d8e996dc576b27\vendor\llama.cpp\ggml-alloc.c:139: !"not enough space in the buffer"```

What is the reason for this, I double checked and the VRAm of the GPU is available (atleast 12GB)
aaltulea commented 1 year ago

i get this error too with desktop 4090, 24GB VRAM + 32GB system memory. Windows 11. Model: TheBloke/Llama-2-7B-Chat-GGML, on: cuda (default settings with the default document).

not enough space in the buffer (needed 156291200, largest block available 18104320)

Edit: I suspect it is related to VRAM and not system RAM, because, for the same prompt (prompt: can a state declare its independence from the united states after being part of the united states), using --device_type cpu will work (answer: 'it is not possible for a state to declare its independence from the United States') but not --device_type cuda (shows 'not enough space in the buffer' error)

Edit2: I used to run the commands from windows terminal, now I switched to pycharms and switched to using the model Wizard-Vicuna-13B-Uncensored-GPTQ-4bit-128g. I no longer have the issue

omkar806 commented 1 year ago

yes same issue I am having I am using my own appllication to host on streamlit and I am running into the same issue ...so you shifted to vicuna and it works ?