cpacker / MemGPT

Create LLM agents with long-term memory and custom tools 📚🦙
https://memgpt.readme.io
Apache License 2.0
10.85k stars 1.17k forks source link

[BUG] There is no `finish_reason: "length"` in `./memgpt/local_llm` #270

Closed 52cs closed 8 months ago

52cs commented 8 months ago

There is no finish_reason in ./memgpt/local_llm which is support in openai which is also used in this repo:

used here: https://github.com/cpacker/MemGPT/blob/main/memgpt/agent.py#L148-L150

async def get_ai_reply_async(
        # special case for 'length'
        if response.choices[0].finish_reason == "length":
            raise Exception("Finish reason was length (maximum context length)")

This exception will finally call the summarize_messages_inplace and call step again. https://github.com/cpacker/MemGPT/blob/main/memgpt/agent.py#L1074-L1080

            # If we got a context alert, try trimming the messages length, then try again
            if "maximum context length" in str(e):
                # A separate API call to run a summarizer
                await self.summarize_messages_inplace()

                # Try step again
                return await self.step(user_message, first_message=first_message)

But I think it is just a workaround. It is not perfect. But it works indeed. This job should be decided by the llm. Anyway, let's talk about it later.

Now, let's have a look at why the issue https://github.com/cpacker/MemGPT/issues/252 occurs. There is no finish_reason: "length" replied by ./memgpt/local_llm when exceeds maximum context length, instead, it raise an exception:

raise Exception(f"Request exceeds maximum context length (code={response.status_code}, msg={response.text}, URI={URI})")

which will not catch by get_ai_reply_async we expected, but by handle_ai_response.

_The handle_ai_response should record the function error message into the message history and feedback to llm in the next step. We expect the llm do the right decision, but it is not. Why? It should be the prompt engineering problem. Let's discuss it later._

It's time to get the conclusion: To solve the issue of https://github.com/cpacker/MemGPT/issues/252 There is two approachs: Approach 1: quick fix. Make ./memgpt/local_llm support finish_reason: "length" instead of raise exception. Approach 2: prompt engineering. Make ./memgpt/local_llm raise more comprehensive exception as prompt into the message history which feedback to llm in the next round for decision.

Finally, when Approach 2 works, let's remove the length catch in get_ai_reply_async and remove the summarize_messages_inplace. Wait a minute, who knows the compact messages good or not. Only the experiments can tell us.

vivi commented 8 months ago

Hey, thanks for taking the time to look through the codebase.

The summarizing should be transparent to the LLM because it's an unrecoverable error -- there's nothing the LLM can do by dfn of its context window limitations -- if you cannot summarize properly (vs. not being able to run a function successfully).

52cs commented 8 months ago

hehe