Closed origintopleft closed 1 year ago
Interesting, maybe max_tokens needs to be sent to the request, looks like the default for that project is 16 (which is incredibly low)?
https://github.com/abetlen/llama-cpp-python/issues/542
Add a line after api\app\clients\OpenAIClient.js on line 64
this.modelOptions.max_tokens = 2000;
if (!this.modelOptions) {
this.modelOptions = {
...modelOptions,
model: modelOptions.model || 'gpt-3.5-turbo',
temperature:
typeof modelOptions.temperature === 'undefined' ? 0.8 : modelOptions.temperature,
top_p: typeof modelOptions.top_p === 'undefined' ? 1 : modelOptions.top_p,
presence_penalty:
typeof modelOptions.presence_penalty === 'undefined' ? 1 : modelOptions.presence_penalty,
stop: modelOptions.stop,
};
}
this.modelOptions.max_tokens = 2000; // new line
Contact Details
No response
What is your question?
I'm using LibreChat with the OpenAI endpoint, but instead of actual OpenAI,
OPENAI_REVERSE_PROXY
is pointed to the local system on port 8000, where llama-cpp-python is serving Llama 2 70B through an OpenAI compatible API. Mostly, this works. The only problem is the responses we get are very short.Our previous chat UI was able to display messages of various lengths. Is there a setting I could change in order to allow longer responses?
More Details
Deployed using docker compose from the git repo.
Docker version 24.0.5, from Ubuntu repositories llama-cpp-python from Aug 17
What is the main subject of your question?
No response
Screenshots
No response
Code of Conduct