antimatter15 / alpaca.cpp

Locally run an Instruction-Tuned Chat-Style LLM
MIT License
10.25k stars 906 forks source link

Is Chat/token Interjection is possible in alpaca.cpp? #116

Open shubham8550 opened 1 year ago

shubham8550 commented 1 year ago

i have chatbot which keeps context (by processing whole history). and as you know the bot gets slower cuz theres lot to process everytime

so i heard this word Chat/token Interjection

"chat/token interjection is when you interrupt the inference and interject. then start the inference process again based on the newly added tokens"

progressionnetwork commented 1 year ago

i have chatbot which keeps context (by processing whole history). and as you know the bot gets slower cuz theres lot to process everytime

How did you done this? Can you share a way?

trevtravtrev commented 1 year ago

We could add context by feeding previous responses hidden into each new prompt. Could also make a hidden request for a new summary of previous response onto past chat history summary before each new prompt is processed. This would get increasingly demanding with 1. Length of conversation 2. Size of weights: 30B would be much much slower and possibly unusable for most. Could also set a token limit to only remember so much previous context, which chatgpt itself does.

progressionnetwork commented 1 year ago

We could add context by feeding previous responses hidden into each new prompt. Could also make a hidden request for a new summary of previous response onto past chat history summary before each new prompt is processed. This would get increasingly demanding with 1. Length of conversation 2. Size of weights: 30B would be much much slower and possibly unusable for most. Could also set a token limit to only remember so much previous context, which chatgpt itself does.

maybe we need only short summary or even some keywords from previous prompts and answer instead of full dialog? btw maybe someone know any commits with this feature?

dan-dean commented 1 year ago

https://github.com/deep-diver/Alpaca-LoRA-Serve

Implements a functional context system and has a demo running on a cloud instance which shows promise. My local testing shows that alpaca.cpp can't remember anything, which makes me confused about the -c and --ctx_size params for alpaca.cpp because they clearly don't work. Their implementation is targeted towards CPUs with the VRAM capacity to run these models, unlike the CPU based alpaca.cpp