Closed shiipou closed 1 year ago
The goal of this, is to make a twitch bot using the LLAMA language model, allow it to keep a certain amount of messages in memory.
I have another program (in typescript) that run the llama.cpp
./main and use stdio to send message to the AI/bot.
I use the 60B model on this bot, but the problem appear with any of the models so quickest to try is 7B.
How much RAM do you have? Please check your free RAM and swap using top while running.
Did you try setting the context size to a larger value using the '-c' startup flag (e.g. ./llama.exe -m D:/models/alpaca/7B/ggml-model-q4_0.bin -t 18 -c 2048
)? This is not a fix, but will allow you to utilize larger prompts and response lengths before running into your issue.
While this helped me, gjmulder has a point that your issue might be different than my issue relating specifically to the program stopping when it fills context. I don't encounter seg faults, my program just closes.
How much RAM do you have? Please check your free RAM and swap using top while running.
I have 64GB of RAM, I don't think the problem came from this because it appear at the same token for 7B and 65B model, which totally don't use the same amount of ram.
Did you try setting the context size to a larger value using the '-c' startup flag (e.g.
./llama.exe -m D:/models/alpaca/7B/ggml-model-q4_0.bin -t 18 -c 2048
)? This is not a fix, but will allow you to utilize larger prompts and response lengths before running into your issue.While this helped me, gjmulder has a point that your issue might be different than my issue relating specifically to the program stopping when it fills context. I don't encounter seg faults, my program just closes.
It seems that work, changing the -c value allow me to use longer prompt, so thank you so much ! Did you know how to calculate the -c value for the prompt I want to use ?
It seems that work, changing the -c value allow me to use longer prompt, so thank you so much ! Did you know how to calculate the -c value for the prompt I want to use ?
Other people will know more about the context limits, but as I understand it, the program will stop running once the context is full (something like while( (size of prompt) + (size of embeddings) < n.context)
The default context is 512 from what I saw, so by setting it to 2048, you allow it 4x space for prompt + completions.
I saw an issue that mentioned they are working on creating a more dynamic method that will create a sort of sliding window for context, but right now it just stops when it is reached.
Also, there are limitations on context based off memory and the model itself. I've seen people use 2k and I'm used to 2k from ChatGPT, but I don't know the limitations of LLaMA. I don't go above 2048 at this point.
I still wonder why you get a segfault and I didn't. Hopefully it isn't a different issue... But glad it's working better for you!
Could it be made to do bounds checking and display some kind of informative error when the buffer is full, rather than just crashing with a mysterious segfault?
Prerequisites
Please answer the following questions for yourself before submitting an issue.
Expected Behavior
I want to be able to run my promt using this command without any
Segmentation fault
error:Where
prompt.md
contains 3083 characters (933 tokens).Current Behavior
The command only output the first 1909 character of the prompt in the console (550 tokens) and throw a
Segmentation fault
error.This close the program and didn't let me execute my prompt.
Environment and Context
Please provide detailed information about your computer setup. This is important in case the issue is not reproducible except for under certain specific conditions.
Models
sha256sum
of each of yourconsolidated*.pth
andggml-model-XXX.bin
files to confirm that you have the correct model data files before logging an issue. Latest sha256 sums for your reference.Failure Information (for bugs)
Steps to Reproduce
Please provide detailed steps for reproducing the issue. We are not sitting in front of your screen, so the more detail the better.
-p
argumentsFailure Logs
Please include any relevant log snippets or files. If it works under one configuration but not under another, please provide logs for both configurations and their corresponding outputs so it is easy to see where behavior changes.
Also, please try to avoid using screenshots if at all possible. Instead, copy/paste the console output and use Github's markdown to cleanly format your logs for easy readability. e.g.
I removed the full prompt because it's not the problem, you just need a 550 token prompt to make it appear.