Closed kuangxiaoye closed 11 months ago
Are you on the latest version? You might be a few versions behind, and I can only properly debug issues with the latest version (0.5.9 or HEAD of main branch). I'm asking because of the color of the gpt-4 icon.
Second, finish_reason: length
is a legitimate finish reason returned by OpenAI, indicating the model has no more room in the context window to generate tokens. However, it should not be the case if gpt-3.5 doesn't run into this issue but gpt-4 does.
Also, I need more context from your logs. Uncomment the following line so it's active: https://github.com/danny-avila/LibreChat/blob/7abc5bc6707853844623765ce2dd84e9ee1fa192/api/server/routes/endpoints/openAI/initializeClient.js#L9
Then recreate the issue and post the logs.
I merged the latest content, but the problem still exists. The icons are different because I like the previous icons.
I have updated my log, uploading the normal gpt3.5 conversation log and the interrupted gpt4.0 conversation log respectively, and I find that they seem to have loaded different numbers of conversation id in exactly the same context.
gpt3.5: { '5ee067fa-45a4-4e7a-96ae-a5163c7b0d17': 189, '626fb256-9ed6-4db2-aaa9-aa43e4739517': 26, '6f3e61ff-3d37-471c-b8de-99c8a2518713': 442, 'fce4bd31-c2fc-43f5-9f00-ef992506eb8d': 1925, instructions: 73 }
gpt4: { 'e9335f1f-e566-4779-a80a-37d0d777f1ff': 758, '4480269c-7a50-4e1e-a224-70dac54c4245': 457, '1c457e26-ed2e-4a72-8d90-f59063d290db': 1689, '2b195c4b-8918-4b95-af8d-89cb836928fa': 724, '515ede24-7348-4fd0-a126-92eac12ff70a': 1860, '5ee067fa-45a4-4e7a-96ae-a5163c7b0d17': 189, '626fb256-9ed6-4db2-aaa9-aa43e4739517': 26, '6f3e61ff-3d37-471c-b8de-99c8a2518713': 442, 'fce4bd31-c2fc-43f5-9f00-ef992506eb8d': 1924, instructions: 74 }
Is this the cause of the problem?
Oh, my God, I found the problem.
const maxTokensMap = { 'gpt-4': 8191, 'gpt-4-0613': 8191, 'gpt-4-32k': 32767, 'gpt-4-32k-0613': 32767, 'gpt-3.5-turbo': 4095, 'gpt-3.5-turbo-0613': 4095, 'gpt-3.5-turbo-0301': 4095, 'gpt-3.5-turbo-16k': 15999, };
Gpt-4 's maxtokens seems to be 4095, because currently gpt-4 is the default 4k model, and accounts with gpt-4-8k model permissions are very rare, have you not considered this problem?
After I changed it to the following, all the conversation worked properly.
const maxTokensMap = { 'gpt-4': 4095, 'gpt-4-0613': 4095, 'gpt-4-32k': 32767, 'gpt-4-32k-0613': 32767, 'gpt-3.5-turbo': 4095, 'gpt-3.5-turbo-0613': 4095, 'gpt-3.5-turbo-0301': 4095, 'gpt-3.5-turbo-16k': 15999, };
I have updated my log, uploading the normal gpt3.5 conversation log and the interrupted gpt4.0 conversation log respectively, and I find that they seem to have loaded different numbers of conversation id in exactly the same context.
Is this the cause of the problem?
No this is normal. Gpt-4 will load more messages because it has a bigger context window
Gpt-4 's maxtokens seems to be 4095, because currently gpt-4 is the default 4k model, and accounts with gpt-4-8k model permissions are very rare, have you not considered this problem?
This is false. If you're using the official OpenAI API or a reverse proxy that is true to official your original screenshot would not be possible and an error would be thrown
What is happening:
Your context is at 8,143, leaving the LLM only 48 tokens to respond with (of the total 8,191). This is normal, and why it interrupts and returns with finish_reason: length
.
This PR https://github.com/danny-avila/LibreChat/pull/973 will help mitigate this issue should you enable summarizing messages. this saves you a lot of tokens in the long run with longer conversation, and some retaining of older messages, while also allowing more space for the LLM to respond with.
Another way to mitigate this issue is to add an extra 'buffer' to the window, so that it prunes more messages as the context limit is reached.
Thank you very much for your help
I will also add a buffer environment variable in the linked PR should you not want messages summarized. Either solution will prevent this issue altogether. happy to help!
I'm actually curious, because this problem didn't exist in the ancient version of librechat, and I didn't retrieve the maxToken parameter in my code. Why do you need maxTokensMap parameters now?
I tried to have the same conversation in chatgpt-clone, and by comparison, I was surprised to find that the number of MaxToken in gpt-4 was about 4095.
I'm actually curious, because this problem didn't exist in the ancient version of librechat, and I didn't retrieve the maxToken parameter in my code. Why do you need maxTokensMap parameters now?
I tried to have the same conversation in chatgpt-clone, and by comparison, I was surprised to find that the number of MaxToken in gpt-4 was about 4095.
in the old files, 4095 was the default for every model, no matter what. this is not desirable for larger context models, as that is partly the point for using, for example gpt-3.5-turbo-16k
Thank you very much, now you have answered all my questions, thank you very much for your work!
I hope I can contribute to this project, and my questions can also help others!
Sorry, I'm doing it again.
I'm very curious about why openaiclient doesn't use @waylaidwanderer/chatgpt-api,Instead, I built an api request myself.
In fact, it has been used in the old version of libchart.@waylaidwanderer/chatgpt-api in chatgpt-client file,And there will be not have questions that are stoped because token is fully loaded.
And I think summarizing token now may be a redundant function that can be completely avoided, because there was no such problem before.
I'm very curious about why openaiclient doesn't use @waylaidwanderer/chatgpt-api,Instead, I built an api request myself.
Because @waylaidwanderer/chatgpt-api is no longer maintained and is lacking support for a lot of things since implemented in LibreChat
In fact, it has been used in the old version of libchart.@waylaidwanderer/chatgpt-api in chatgpt-client file,And there will be not have questions that are stoped because token is fully loaded.
One simple reason for this, as I said above, is that @waylaidwanderer/chatgpt-api does not consider the 8k context for gpt-4, giving it 4k context. So even if your prompt is 4095 tokens, GPT-4, in reality, has 4000 more context tokens for the response, so your messages are never 'stopped'. Even this simple feature is not maintained.
And I think summarizing token now may be a redundant function that can be completely avoided, because there was no such problem before.
Summarizing tokens is a new feature, released after you had closed your issue. I don't see how it's redundant, especially when longer conversations are capped at half the context, avoiding your issue altogether
Thank you very much, because I see that the summary function still has todo to be implemented, so now I just need to switch to the latest version and turn on the summary function to solve the above problem, right?
Thank you very much, because I see that the summary function still has todo to be implemented, so now I just need to switch to the latest version and turn on the summary function to solve the above problem, right?
yes it will solve your issue at present
Contact Details
w258765@gmail.com
What happened?
The problem of stopping without any error report in a conversation with chatgpt.The strange thing is that after stopping, clicking continue will continue the conversation normally.After this problem occurs, I switch the model to gpt-3.5, the conversation can still continue, and then switch back to gpt-4, and the above problem will be repeated.
Steps to Reproduce
What browsers are you seeing the problem on?
No response
Relevant log output