[Question]: Very short responses when getting completions from llama-cpp-python

danny-avila / LibreChat

Enhanced ChatGPT Clone: Features Anthropic, OpenAI, Assistants API, Azure, Groq, GPT-4o, Mistral, OpenRouter, Vertex AI, Gemini, Artifacts, AI model switching, message search, langchain, DALL-E-3, ChatGPT Plugins, OpenAI Functions, Secure Multi-User System, Presets, completely open-source for self-hosting. Actively in public development.

MIT License

17.2k stars 2.87k forks source link

Contact Details

No response

What is your question?

I'm using LibreChat with the OpenAI endpoint, but instead of actual OpenAI, OPENAI_REVERSE_PROXY is pointed to the local system on port 8000, where llama-cpp-python is serving Llama 2 70B through an OpenAI compatible API. Mostly, this works. The only problem is the responses we get are very short.

Our previous chat UI was able to display messages of various lengths. Is there a setting I could change in order to allow longer responses?

More Details

Deployed using docker compose from the git repo.

Docker version 24.0.5, from Ubuntu repositories llama-cpp-python from Aug 17

What is the main subject of your question?

No response

Screenshots

No response

Code of Conduct

[X] I agree to follow this project's Code of Conduct

if (!this.modelOptions) { this.modelOptions = { ...modelOptions, model: modelOptions.model || 'gpt-3.5-turbo', temperature: typeof modelOptions.temperature === 'undefined' ? 0.8 : modelOptions.temperature, top_p: typeof modelOptions.top_p === 'undefined' ? 1 : modelOptions.top_p, presence_penalty: typeof modelOptions.presence_penalty === 'undefined' ? 1 : modelOptions.presence_penalty, stop: modelOptions.stop, }; } this.modelOptions.max_tokens = 2000; // new line

danny-avila / LibreChat