I have no problem using the chat, until the prompt + answer token size exceed the context size, then the answer interrupts while answering, and without leaving the "session.ChatAsync" method. Which is expected at this point as I don't check my prompt size.
So I simply check the history in my code to remove some old message if the total prompt size exceed the context size. I use "session.History.Messages.Remove()" to do that. But while I have the confirmation that the token size is now okay using "context.Tokenize()", the answer still interrupt whil answering, as if what I did was useless.
So I don't really know what's wrong.
Reproduction Steps
private static uint contextSize = 260 ;
public void InitChat(string modelPath)
{
Task.Run((async () =>
{
var parameters = new ModelParams(modelPath)
{
ContextSize = 260, // limiting context to check the bug quickly, but it happens no matter the context size
GpuLayerCount = 5 // How many layers to offload to GPU. Please adjust it according to your GPU memory.
};
using var model = LLamaWeights.LoadFromFile(parameters);
using var context = model.CreateContext(parameters);
var executor = new InteractiveExecutor(context);
var chatHistory = new ChatHistory();
chatHistory.AddMessage(AuthorRole.System, "You are a coding assistant.");
ChatSession session = new(executor, chatHistory);
var bannedwords = new List<string>() {"User:" };
while (true)
{
signalEvent.Wait(token);
InferenceParams inferenceParams = new InferenceParams()
{
MaxTokens = 56,
};
var x = new ChatHistory.Message(AuthorRole.User, currentMessage).ToString();
// Check if future context won't be > contextsize (history + answer maxtokens + currentmessagelength + 20 security)
while (context.Tokenize(session.HistoryTransform.HistoryToText(session.History)).Length
+ AdvancedSettings.MaxTokens
+ context.Tokenize(currentMessage).Length
+ 20 > contextSize){
// Remove first message that is not system to free some context
session.History.Messages.Remove(session.History.Messages.FirstOrDefault(m => m.AuthorRole !=
AuthorRole.System));
}
string buffer = "";
await foreach (var text in session.ChatAsync(new ChatHistory.Message(AuthorRole.User, currentMessage), inferenceParams))
{
buffer += text;
}
signalEvent.Reset();
}
}));
}
Environment & Configuration
Operating system: Windows 11
.NET runtime version: .net 8
LLamaSharp version: 0.11.2
CUDA version (if you are using cuda backend): CPU Bakcend, 0.11.2
CPU & GPU device: AMD Ryzen 7 4800HS
Known Workarounds
I tried the same code using 0.10 version and weirdly, it works and continue the conversation as expected, removing old messages to keep context size below the limit.
Description
Hi,
I have no problem using the chat, until the prompt + answer token size exceed the context size, then the answer interrupts while answering, and without leaving the "session.ChatAsync" method. Which is expected at this point as I don't check my prompt size.
So I simply check the history in my code to remove some old message if the total prompt size exceed the context size. I use "session.History.Messages.Remove()" to do that. But while I have the confirmation that the token size is now okay using "context.Tokenize()", the answer still interrupt whil answering, as if what I did was useless.
So I don't really know what's wrong.
Reproduction Steps
Environment & Configuration
Known Workarounds
I tried the same code using 0.10 version and weirdly, it works and continue the conversation as expected, removing old messages to keep context size below the limit.