Open shubham8550 opened 1 year ago
i have chatbot which keeps context (by processing whole history). and as you know the bot gets slower cuz theres lot to process everytime
How did you done this? Can you share a way?
We could add context by feeding previous responses hidden into each new prompt. Could also make a hidden request for a new summary of previous response onto past chat history summary before each new prompt is processed. This would get increasingly demanding with 1. Length of conversation 2. Size of weights: 30B would be much much slower and possibly unusable for most. Could also set a token limit to only remember so much previous context, which chatgpt itself does.
We could add context by feeding previous responses hidden into each new prompt. Could also make a hidden request for a new summary of previous response onto past chat history summary before each new prompt is processed. This would get increasingly demanding with 1. Length of conversation 2. Size of weights: 30B would be much much slower and possibly unusable for most. Could also set a token limit to only remember so much previous context, which chatgpt itself does.
maybe we need only short summary or even some keywords from previous prompts and answer instead of full dialog? btw maybe someone know any commits with this feature?
https://github.com/deep-diver/Alpaca-LoRA-Serve
Implements a functional context system and has a demo running on a cloud instance which shows promise. My local testing shows that alpaca.cpp can't remember anything, which makes me confused about the -c and --ctx_size params for alpaca.cpp because they clearly don't work. Their implementation is targeted towards CPUs with the VRAM capacity to run these models, unlike the CPU based alpaca.cpp
i have chatbot which keeps context (by processing whole history). and as you know the bot gets slower cuz theres lot to process everytime
so i heard this word Chat/token Interjection
"chat/token interjection is when you interrupt the inference and interject. then start the inference process again based on the newly added tokens"