danny-avila / LibreChat

Enhanced ChatGPT Clone: Features Anthropic, OpenAI, Assistants API, Azure, Groq, GPT-4o, Mistral, OpenRouter, Vertex AI, Gemini, Artifacts, AI model switching, message search, langchain, DALL-E-3, ChatGPT Plugins, OpenAI Functions, Secure Multi-User System, Presets, completely open-source for self-hosting. Actively in public development.
https://librechat.ai/
MIT License
17.2k stars 2.87k forks source link

[Question]: Very short responses when getting completions from llama-cpp-python #870

Closed origintopleft closed 1 year ago

origintopleft commented 1 year ago

Contact Details

No response

What is your question?

I'm using LibreChat with the OpenAI endpoint, but instead of actual OpenAI, OPENAI_REVERSE_PROXY is pointed to the local system on port 8000, where llama-cpp-python is serving Llama 2 70B through an OpenAI compatible API. Mostly, this works. The only problem is the responses we get are very short.

Our previous chat UI was able to display messages of various lengths. Is there a setting I could change in order to allow longer responses?

More Details

Deployed using docker compose from the git repo.

Docker version 24.0.5, from Ubuntu repositories llama-cpp-python from Aug 17

What is the main subject of your question?

No response

Screenshots

No response

Code of Conduct

danny-avila commented 1 year ago

Interesting, maybe max_tokens needs to be sent to the request, looks like the default for that project is 16 (which is incredibly low)?

https://github.com/abetlen/llama-cpp-python/issues/542

Add a line after api\app\clients\OpenAIClient.js on line 64 this.modelOptions.max_tokens = 2000;

    if (!this.modelOptions) {
      this.modelOptions = {
        ...modelOptions,
        model: modelOptions.model || 'gpt-3.5-turbo',
        temperature:
          typeof modelOptions.temperature === 'undefined' ? 0.8 : modelOptions.temperature,
        top_p: typeof modelOptions.top_p === 'undefined' ? 1 : modelOptions.top_p,
        presence_penalty:
          typeof modelOptions.presence_penalty === 'undefined' ? 1 : modelOptions.presence_penalty,
        stop: modelOptions.stop,
      };
    }
this.modelOptions.max_tokens = 2000; // new line