Closed aleksusklim closed 6 months ago
Maybe the >
character should be included too?
### Instruction:
> My favorite color is green. What's yours?
### Response:
My favorite color is blue.
Oh, at least it's blue!
### Instruction:
> My favorite color is green. What's yours?
### Response:
My favorite color is blue.
### Instruction:
> "sky", "water", "sapphire" - what is similar in these things?
### Response:
All of these things have a commonality of being related to the color blue. They are all shades or tones of blue.
And what if we…
…
### Instruction:
> Give me examples of things of my favorite color.
### Response:
Here are some examples of things that are typically associated with the color blue:
- Sky
- Water
- Sapphire
- Ocean
- Sea
- Blueberry
- Indigo
- Navy blue
- Cerulean
- Ultramarine
- Titanium
- Slate
- Lapis lazuli
- Cornflower blue
- Baby blue eyes
- Forest green
Well, I don't know. It is still unclear to me… No-newlines version:
### Instruction:
> My favorite color is green. What's yours?
### Response:
Green is a great color! Is there anything specific you want to know about it?
It seems like it inserts the beginning-of-string token in front of ### Instruction:
every time.
The true
means add BOS.
It probably depends on the model but I think it affects the outcome a lot.
I've noticed the same issue and in my experience -ins sets default n_keep to 2 even if set to --keep -1 or 0. -p does not have this problem.
This issue was closed because it has been inactive for 14 days since being marked as stale.
I had a feeling that sometimes certain models are good at keeping context in chat/instruct conversation, but other models are bad.
To be sure that it is a quality of the model, but not llama.cpp's chat mode of main.exe, I tested the same Instruction template in koboldcpp.
To my surprise, it followed the conversation much better than my attempts in llama.cpp. I tried to find a clear example, where
-ins
breaks, but it was hard. It is not like the model have no memory at all, it more looks like it does not "respect" it much, having greater possibility to treat next instruction as a separate unrelated question.Then I tried to put my dialogue to
-f in.txt
and continue it there with main.exe. It worked good! But it did not yield to the same results as in chat/instruct mode, even with zero temperature and the same seed.Example command: (version
llama-master-66874d4-bin-win-avx2-x64
)main.exe -t 8 -c 2048 --temp 0 --top-p 0 --top-k 1 --seed 1 -m WizardLM-7B-uncensored.ggml.q5_1.bin --ins
(looks like-i
,--keep -1
,-r "###"
did not improve anything)Chat that shows the issue:
It seems to forgot the info from two prompt back (but often remembers info from directly preceding instruction).
When I tried to
main.exe -t 8 -c 2048 --temp 0 --top-p 0 --top-k 1 --seed 1 -m WizardLM-7B-uncensored.ggml.q5_1.bin -f in.txt
If gave completely different (and kinda incorrect) response:
Above was with newlines at the beginning of the document, without them it would be:
Here is without linefeeds at all:
Anyway, if I craft my initial chat as document:
It spits back a large list:
It quickly became incorrect, but at least the model understood the question and used conversation history in its answer. Which is often not the case for
-ins
mode.However, Instruct chats clearly have memory too:
If I omit "of them" it becomes:
But so does
-f in.txt
version too, it outputs:Still, the chat-mode has much stronger tendency to not respect conversation history! I cannot prove it, especially because it is different between models. But if anybody would explain, what exactly should I type in the RAW prompt to get the exact answers as I get with
-ins
– then I could experiment further and conclude why this is happens, and whether chat mode is buggy or not.