Open vmajor opened 1 year ago
Standing by!
It seems to be a llama.cpp issue. I found it mentioned regarding starcoder models too. I think you can carry on :)
Update. The issue resolves on reboot so there is some memory leak in the code.
This error would likely resolve if I restart wsl2, but it is messy for me to do that due to needing to remount my ext4 partitions, and I do not think that particular data point is as significant.
Update. The issue resolves on reboot so there is some memory leak in the code.
This error would likely resolve if I restart wsl2, but it is messy for me to do that due to needing to remount my ext4 partitions, and I do not think that particular data point is as significant.
Hey but for a commercial application we can't afford to have it like this right ? This is happening with me on groovy 1.3 . Even that only on some OS like RHEL. It's really frustrating and difficult to deal with this
Hey but for a commercial application we can't afford to have it like this right ? This is happening with me on groovy 1.3 . Even that only on some OS like RHEL. It's really frustrating and difficult to deal with this
Hi I am using gpt-4-all gpt-j 1.3 groovy which has Apache license.
Another update, my guanaco-65B-GGML-q6_K.bin
model just failed with the same error. So it is not just 30B models that are affected.
ggml_new_tensor_impl: not enough space in the scratch memory pool (needed 1143972864, available 1073741824)
Segmentation fault
In hope to help isolate the bug I tried to reproduce the issue since version 0.1.55. The first release that I experience the issue is 0.1.76(0.1.75 wasn't tested, isn't available on pypi) and didn't see it on 0.1.74 Could it be related to this change? https://github.com/abetlen/llama-cpp-python/compare/v0.1.74...v0.1.76#diff-9184e090a770a03ec97535fbef520d03252b635dafbed7fa99e59a5cca569fbcR200
Environment
python -V
Python 3.10.12
uname -a
Linux Idan-PC 5.15.90.1-microsoft-standard-WSL2 #1 SMP Fri Jan 27 02:56:13 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Model: nous-hermes-13b.ggmlv3.q4_0.bin
Expected Behavior
This happens (so far) only with these models: Wizard-Vicuna-30B-Uncensored.ggmlv3.q8_0.bin WizardLM-30B-Uncensored.ggmlv3.q8_0.bin based-30b.ggmlv3.q8_0.bin
Larger 65B models work fine. It could be something related to how these models are made, I will also reach out to @ehartford
llama-cpp-python 0.1.59 installed with OpenBLAS
CMAKE_ARGS="-DLLAMA_OPENBLAS=on" FORCE_CMAKE=1 pip install llama-cpp-python --force-reinstall --upgrade --no-cache-dir
I was running my usual code on the CPU and restarting it to tweak the results when this error came up. I made no code changes, only to context length, I reduced it as it was exceeding the 2048 token limit.
Current Behavior
Environment and Context
wsl2 python 3.10.9
$ lscpu
AMD Ryzen 9 3900XT 12-Core Processor$ uname -a
5.15.68.1-microsoft-standard-WSL2+ #2 SMPTo me it is 100% reproducible after several inference runs with Wizard-Vicuna-30B-Uncensored.ggmlv3.q8_0.bin