Modifying chat.cpp to implement a ChatGPT-like conversational context memory

antimatter15 / alpaca.cpp

Locally run an Instruction-Tuned Chat-Style LLM

MIT License

10.25k stars 907 forks source link

Modifying chat.cpp to implement a ChatGPT-like conversational context memory #173

Open MalekWahidi opened 1 year ago

MalekWahidi commented 1 year ago

Is there any trivial way in which the code in chat.cpp could be tweaked to prepend each submitted prompt with the most recent couple of answers and prompts perhaps (or will this make the input sequence too large for the model?) so that alpaca could have some contextual understanding about the previous interactions within this conversation and reply accordingly as in ChatGPT. I would have liked to try it myself but I don't have enough expertise in C++ to attempt this. Any ideas or references?

SamuelTallet commented 1 year ago

If needed, I posted in this related issue a prompt template that handles the conversational context. And about C++, I can't help unfortunately.

betolley commented 1 year ago

I cant tell. Are you saying something needs to be written in the chat.exe app to do the context? I didn't know if you were brainstorming or this was something already implemented?

Terristen commented 1 year ago

I'm confused by the original request. Are you saying it needs prior context? If so, I'm shocked, because I've been carrying on contiguous conversations in 30B with no issue wrt conversation context. Is it chatgpt level? Nope. But it's not missing.

MalekWahidi commented 1 year ago

I'm confused by the original request. Are you saying it needs prior context? If so, I'm shocked, because I've been carrying on contiguous conversations in 30B with no issue wrt conversation context. Is it chatgpt level? Nope. But it's not missing.

Really? My first few chats with the 7B model revealed a lack of conversational context awareness. Maybe I should just experiment more with it.

Terristen commented 1 year ago

You can't really compare the performance conversationally of 7b to 30b. 7b just isn't very good. It can answer some questions but doesn't have the capability to hold context well. 30b, with the same chat.cpp is considerably better, but still not great. The issues you're having are probably more about the model than the cpp code in the project. That's what I'm trying to say. Also, from personal experience, 13b is actually worse than 7b... just a fair warning.

SamuelTallet commented 1 year ago

@Terristen Which CLI args do you use, please?

Terristen commented 1 year ago

@Terristen Which CLI args do you use, please?

I wish I could tell you right now, but I'm mid build on a new bigger/badder machine so I don't have access to my latest .bat file. I definitely run -t 20 and turn the temperature up to 1.35 if I recall. As for others, I've not done a lot of tuning but I think I dial up the repeat last to something higher than default. My ongoing issue is running out of memory after about 25 prompts, though up to that it's pretty cogent.

mirek190 commented 1 year ago

I'm using LLAMA 60B and remember our conversation perfectly. LLAMA 7B is too simple to remember conversation. 60B is really good ....

My parameters are not fancy ...

main -m models/65B/ggml-model-q4_0.bin -t 16 -n 2048 --keep 48 --repeat_penalty 1.0 --color -i -r "User:" -f prompts/chat-with-bob.txt

Screenshot 2023-04-23 223054

alpaca.cpp is outdated anyway ... they merged to llama.ccp.