[Enhancement] Add buffer to increase possible response tokens

kuangxiaoye commented 11 months ago

Contact Details

w258765@gmail.com

What happened?

The problem of stopping without any error report in a conversation with chatgpt.The strange thing is that after stopping, clicking continue will continue the conversation normally.After this problem occurs, I switch the model to gpt-3.5, the conversation can still continue, and then switch back to gpt-4, and the above problem will be repeated.

Steps to Reproduce

This problem does not appear in the new conversation, but in situations where the conversation has already been made many times
I checked the message method and printed each return value to see why this happened.Please see the relevant log output section
After working hard on debug, I didn't find any problems related to" finish_reason:length". I hope I can get your help.

What browsers are you seeing the problem on?

No response

Relevant log output


I debug the getCompletion file, and I find that the output of the onmessage method has the following problems when it is finally interrupted：

## Gpt-3.5 's conversation log (normal Q & A):

ask log
{
  text: '', // This is the content of my question. I deleted it manually for the sake of appearance.
  conversationId: 'c2e62814-f690-4ecf-931d-97ce6214dec3',
  endpointOption: {
    endpoint: 'openAI',
    chatGptLabel: '充值',
    promptPrefix: '你是一个nodejs架构师，你精通express框架，并且在web编写方面非常在行。你需要与我一起完成一个充值界面的编写工作，为我的代码做出改正或提出建议',
    modelOptions: {
      model: 'gpt-3.5-turbo-0613',
      temperature: 1,
      top_p: 1,
      presence_penalty: 0,
      frequency_penalty: 0
    }
  }
}
{
  '5ee067fa-45a4-4e7a-96ae-a5163c7b0d17': 189,
  '626fb256-9ed6-4db2-aaa9-aa43e4739517': 26,
  '6f3e61ff-3d37-471c-b8de-99c8a2518713': 442,
  'fce4bd31-c2fc-43f5-9f00-ef992506eb8d': 1925,
  instructions: 73
}
userMessage.tokenCount 1925
userMessage {
  messageId: 'fce4bd31-c2fc-43f5-9f00-ef992506eb8d',
  parentMessageId: '6f3e61ff-3d37-471c-b8de-99c8a2518713',
  conversationId: 'c2e62814-f690-4ecf-931d-97ce6214dec3',
  sender: 'User',
  text: '', // This is the content of my question. I deleted it manually for the sake of appearance.
   isCreatedByUser: true,
  tokenCount: 1925
}
promptTokens, completionTokens: 2655 1445

## Gpt-4 's conversation log （Be interrupted - finish_reason：length）

ask log
{
  text: '‘， //// This is the content of my question. I deleted it manually for the sake of appearance.
  conversationId: 'c2e62814-f690-4ecf-931d-97ce6214dec3',
  endpointOption: {
    endpoint: 'openAI',
    chatGptLabel: '充值',
    promptPrefix: '你是一个nodejs架构师，你精通express框架，并且在web编写方面非常在行。你需要与我一起完成一个充值界面的编写工作，为我的代码做出改正或提出建议',
    modelOptions: {
      model: 'gpt-4-0613',
      temperature: 1,
      top_p: 1,
      presence_penalty: 0,
      frequency_penalty: 0
    }
  }
}
promptTokens, completionTokens: 8143 51
New value for sk-w5vaqiu5v6sxSPKqgoye6zBAPzuJZcwiMIxkc3F2NTrWEbzQ is 4089
{
  'e9335f1f-e566-4779-a80a-37d0d777f1ff': 758,
  '4480269c-7a50-4e1e-a224-70dac54c4245': 457,
  '1c457e26-ed2e-4a72-8d90-f59063d290db': 1689,
  '2b195c4b-8918-4b95-af8d-89cb836928fa': 724,
  '515ede24-7348-4fd0-a126-92eac12ff70a': 1860,
  '5ee067fa-45a4-4e7a-96ae-a5163c7b0d17': 189,
  '626fb256-9ed6-4db2-aaa9-aa43e4739517': 26,
  '6f3e61ff-3d37-471c-b8de-99c8a2518713': 442,
  'fce4bd31-c2fc-43f5-9f00-ef992506eb8d': 1924,
  instructions: 74
}
userMessage.tokenCount 1924
userMessage {
  messageId: 'fce4bd31-c2fc-43f5-9f00-ef992506eb8d',
  parentMessageId: '6f3e61ff-3d37-471c-b8de-99c8a2518713',
  conversationId: 'c2e62814-f690-4ecf-931d-97ce6214dec3',
  sender: 'User',
  text: '', // This is the content of my question. I deleted it manually for the sake of appearance.
  isCreatedByUser: true,
  tokenCount: 1924
}
promptTokens, completionTokens: 8143 51

### Screenshots

<img width="820" alt="image" src="https://github.com/danny-avila/LibreChat/assets/43713316/12fd44ac-a8bd-484b-9638-56722a005d21">

### Code of Conduct

- [X] I agree to follow this project's Code of Conduct

danny-avila commented 11 months ago

Are you on the latest version? You might be a few versions behind, and I can only properly debug issues with the latest version (0.5.9 or HEAD of main branch). I'm asking because of the color of the gpt-4 icon.

Second, finish_reason: length is a legitimate finish reason returned by OpenAI, indicating the model has no more room in the context window to generate tokens. However, it should not be the case if gpt-3.5 doesn't run into this issue but gpt-4 does.

Also, I need more context from your logs. Uncomment the following line so it's active: https://github.com/danny-avila/LibreChat/blob/7abc5bc6707853844623765ce2dd84e9ee1fa192/api/server/routes/endpoints/openAI/initializeClient.js#L9

Then recreate the issue and post the logs.

kuangxiaoye commented 11 months ago

I merged the latest content, but the problem still exists. The icons are different because I like the previous icons.

I have updated my log, uploading the normal gpt3.5 conversation log and the interrupted gpt4.0 conversation log respectively, and I find that they seem to have loaded different numbers of conversation id in exactly the same context.

gpt3.5： { '5ee067fa-45a4-4e7a-96ae-a5163c7b0d17': 189, '626fb256-9ed6-4db2-aaa9-aa43e4739517': 26, '6f3e61ff-3d37-471c-b8de-99c8a2518713': 442, 'fce4bd31-c2fc-43f5-9f00-ef992506eb8d': 1925, instructions: 73 }

gpt4： { 'e9335f1f-e566-4779-a80a-37d0d777f1ff': 758, '4480269c-7a50-4e1e-a224-70dac54c4245': 457, '1c457e26-ed2e-4a72-8d90-f59063d290db': 1689, '2b195c4b-8918-4b95-af8d-89cb836928fa': 724, '515ede24-7348-4fd0-a126-92eac12ff70a': 1860, '5ee067fa-45a4-4e7a-96ae-a5163c7b0d17': 189, '626fb256-9ed6-4db2-aaa9-aa43e4739517': 26, '6f3e61ff-3d37-471c-b8de-99c8a2518713': 442, 'fce4bd31-c2fc-43f5-9f00-ef992506eb8d': 1924, instructions: 74 }

Is this the cause of the problem?

kuangxiaoye commented 11 months ago

Oh, my God, I found the problem.

const maxTokensMap = { 'gpt-4': 8191, 'gpt-4-0613': 8191, 'gpt-4-32k': 32767, 'gpt-4-32k-0613': 32767, 'gpt-3.5-turbo': 4095, 'gpt-3.5-turbo-0613': 4095, 'gpt-3.5-turbo-0301': 4095, 'gpt-3.5-turbo-16k': 15999, };

Gpt-4 's maxtokens seems to be 4095, because currently gpt-4 is the default 4k model, and accounts with gpt-4-8k model permissions are very rare, have you not considered this problem?

After I changed it to the following, all the conversation worked properly.

const maxTokensMap = { 'gpt-4': 4095, 'gpt-4-0613': 4095, 'gpt-4-32k': 32767, 'gpt-4-32k-0613': 32767, 'gpt-3.5-turbo': 4095, 'gpt-3.5-turbo-0613': 4095, 'gpt-3.5-turbo-0301': 4095, 'gpt-3.5-turbo-16k': 15999, };

danny-avila commented 11 months ago

I have updated my log, uploading the normal gpt3.5 conversation log and the interrupted gpt4.0 conversation log respectively, and I find that they seem to have loaded different numbers of conversation id in exactly the same context.

Is this the cause of the problem?

No this is normal. Gpt-4 will load more messages because it has a bigger context window

Gpt-4 's maxtokens seems to be 4095, because currently gpt-4 is the default 4k model, and accounts with gpt-4-8k model permissions are very rare, have you not considered this problem?

This is false. If you're using the official OpenAI API or a reverse proxy that is true to official your original screenshot would not be possible and an error would be thrown

What is happening: Your context is at 8,143, leaving the LLM only 48 tokens to respond with (of the total 8,191). This is normal, and why it interrupts and returns with finish_reason: length.

This PR https://github.com/danny-avila/LibreChat/pull/973 will help mitigate this issue should you enable summarizing messages. this saves you a lot of tokens in the long run with longer conversation, and some retaining of older messages, while also allowing more space for the LLM to respond with.

Another way to mitigate this issue is to add an extra 'buffer' to the window, so that it prunes more messages as the context limit is reached.

kuangxiaoye commented 11 months ago

Thank you very much for your help

danny-avila commented 11 months ago

I will also add a buffer environment variable in the linked PR should you not want messages summarized. Either solution will prevent this issue altogether. happy to help!

kuangxiaoye commented 11 months ago

I'm actually curious, because this problem didn't exist in the ancient version of librechat, and I didn't retrieve the maxToken parameter in my code. Why do you need maxTokensMap parameters now?

I tried to have the same conversation in chatgpt-clone, and by comparison, I was surprised to find that the number of MaxToken in gpt-4 was about 4095.

danny-avila commented 11 months ago

I'm actually curious, because this problem didn't exist in the ancient version of librechat, and I didn't retrieve the maxToken parameter in my code. Why do you need maxTokensMap parameters now?

I tried to have the same conversation in chatgpt-clone, and by comparison, I was surprised to find that the number of MaxToken in gpt-4 was about 4095.

in the old files, 4095 was the default for every model, no matter what. this is not desirable for larger context models, as that is partly the point for using, for example gpt-3.5-turbo-16k

kuangxiaoye commented 11 months ago

Thank you very much, now you have answered all my questions, thank you very much for your work!

I hope I can contribute to this project, and my questions can also help others!

kuangxiaoye commented 11 months ago

Sorry, I'm doing it again.

I'm very curious about why openaiclient doesn't use @waylaidwanderer/chatgpt-api，Instead, I built an api request myself.

In fact, it has been used in the old version of libchart.@waylaidwanderer/chatgpt-api in chatgpt-client file，And there will be not have questions that are stoped because token is fully loaded.

And I think summarizing token now may be a redundant function that can be completely avoided, because there was no such problem before.

danny-avila commented 11 months ago

I'm very curious about why openaiclient doesn't use @waylaidwanderer/chatgpt-api，Instead, I built an api request myself.

Because @waylaidwanderer/chatgpt-api is no longer maintained and is lacking support for a lot of things since implemented in LibreChat

In fact, it has been used in the old version of libchart.@waylaidwanderer/chatgpt-api in chatgpt-client file，And there will be not have questions that are stoped because token is fully loaded.

One simple reason for this, as I said above, is that @waylaidwanderer/chatgpt-api does not consider the 8k context for gpt-4, giving it 4k context. So even if your prompt is 4095 tokens, GPT-4, in reality, has 4000 more context tokens for the response, so your messages are never 'stopped'. Even this simple feature is not maintained.

And I think summarizing token now may be a redundant function that can be completely avoided, because there was no such problem before.

Summarizing tokens is a new feature, released after you had closed your issue. I don't see how it's redundant, especially when longer conversations are capped at half the context, avoiding your issue altogether

kuangxiaoye commented 11 months ago

Thank you very much, because I see that the summary function still has todo to be implemented, so now I just need to switch to the latest version and turn on the summary function to solve the above problem, right?

danny-avila commented 11 months ago

Thank you very much, because I see that the summary function still has todo to be implemented, so now I just need to switch to the latest version and turn on the summary function to solve the above problem, right?

yes it will solve your issue at present

danny-avila / LibreChat