SciSharp / LLamaSharp

A C#/.NET library to run LLM (🦙LLaMA/LLaVA) on your local device efficiently.
https://scisharp.github.io/LLamaSharp
MIT License
2.18k stars 294 forks source link

[BUG]: Answer stop abruptly after contextsize, even with limiting prompt size #722

Open kikipoulet opened 1 month ago

kikipoulet commented 1 month ago

Description

Hi,

I have no problem using the chat, until the prompt + answer token size exceed the context size, then the answer interrupts while answering, and without leaving the "session.ChatAsync" method. Which is expected at this point as I don't check my prompt size.

So I simply check the history in my code to remove some old message if the total prompt size exceed the context size. I use "session.History.Messages.Remove()" to do that. But while I have the confirmation that the token size is now okay using "context.Tokenize()", the answer still interrupt whil answering, as if what I did was useless.

So I don't really know what's wrong.

Reproduction Steps

  private static uint contextSize = 260 ;

    public void InitChat(string modelPath)
    {

        Task.Run((async () =>
        {
            var parameters = new ModelParams(modelPath)
            {
                ContextSize = 260, // limiting context to check the bug quickly, but it happens no matter the context size

                GpuLayerCount = 5 // How many layers to offload to GPU. Please adjust it according to your GPU memory.
            };
            using var model = LLamaWeights.LoadFromFile(parameters);
            using var context = model.CreateContext(parameters);

            var executor = new InteractiveExecutor(context);

            var chatHistory = new ChatHistory(); 

            chatHistory.AddMessage(AuthorRole.System, "You are a coding assistant.");

            ChatSession session = new(executor, chatHistory);

            var bannedwords = new List<string>() {"User:" };

            while (true)
            {

                signalEvent.Wait(token);  

                InferenceParams inferenceParams = new InferenceParams()
                {
                    MaxTokens = 56,
                };

                var x = new ChatHistory.Message(AuthorRole.User, currentMessage).ToString();

               // Check if future context won't be > contextsize (history + answer maxtokens + currentmessagelength + 20 security)
                while (context.Tokenize(session.HistoryTransform.HistoryToText(session.History)).Length  
                       + AdvancedSettings.MaxTokens
                       + context.Tokenize(currentMessage).Length
                       + 20 > contextSize){

                                 // Remove first message that is not system to free some context
                                  session.History.Messages.Remove(session.History.Messages.FirstOrDefault(m => m.AuthorRole != 
                                  AuthorRole.System));

                }

                string buffer = "";

                await foreach (var text in session.ChatAsync(new ChatHistory.Message(AuthorRole.User, currentMessage), inferenceParams))
                {

                   buffer += text;

                }

                signalEvent.Reset();

            }
        }));
    }

Environment & Configuration

Known Workarounds

I tried the same code using 0.10 version and weirdly, it works and continue the conversation as expected, removing old messages to keep context size below the limit.

AsakusaRinne commented 1 month ago

Hi, thank you for reporting us this BUG! I'll look into this problem later.