continuedev / continue

⏩ Continue is the leading open-source AI code assistant. You can connect any models and any context to build custom autocomplete and chat experiences inside VS Code and JetBrains
https://docs.continue.dev/
Apache License 2.0
18.91k stars 1.6k forks source link

Autocomplete fails with gpt-4o (and possibly other models) behind proxy #1824

Open Heap0017 opened 3 months ago

Heap0017 commented 3 months ago

Before submitting your bug report

Relevant environment info

- IDE: VSCode
- Model: gpt-4o via LiteLLM
- config.json:

  {
      "provider": "openai",
      "apiBase": "http://localhost:4000",
      "model": "azure/GPT4o"
    }

Description

gpt-4o (and possibly other models) cannot be used for Autocomplete behind a proxy such as LiteLLM. The requests fail because too many (>4) stop words are supplied.

See also https://github.com/continuedev/continue/issues/1371

To reproduce

Proxy gpt-4o using LiteLLM. Try to use this model for autocompletion using provider openai.

Log output

No response

sestinj commented 1 month ago

@Heap0017 we now hav a "maxStopWords" property that you can set like this:

{
      "provider": "openai",
      "apiBase": "http://localhost:4000",
      "model": "azure/GPT4o",
      "maxStopWords": 4
    }
alhimik45 commented 1 month ago

@Heap0017 we now hav a "maxStopWords" property that you can set like this:

{
      "provider": "openai",
      "apiBase": "http://localhost:4000",
      "model": "azure/GPT4o",
      "maxStopWords": 4
    }

It does not work though. Tested with local model to see requests (v0.8.51, vscode): Set max words to 4 in config.json:

  "tabAutocompleteModel": {
    "model": "Qwen/CodeQwen1.5-7B-Chat-GGUF/codeqwen-1_5-7b-chat-q8_0.gguf",
    "title": "LM Studio",
    "apiBase": "http://localhost:1234/v1/",
    "provider": "lmstudio",
    "maxStopWords": 4
  },

Request anyway contains 17 stops

Received POST request to /v1/completions with body: {
  "model": "Qwen/CodeQwen1.5-7B-Chat-GGUF/codeqwen-1_5-7b-chat-q8_0.gguf",
  "max_tokens": 1024,
  "temperature": 0.01,
  "stop": [
    "<fim_prefix>",
    "<fim_suffix>",
    "<fim_middle>",
    "<file_sep>",
    "<|endoftext|>",
    "</fim_middle>",
    "</code>",
    "\n\n",
    "\r\n\r\n",
    "/src/",
    "#- coding: utf-8",
    "```",
    "\nfunction",
    "\nclass",
    "\nmodule",
    "\nexport",
    "\nimport"
  ],
alhimik45 commented 1 month ago

As I understand maxStopWords isn't used for streamFim requests which are used for tabAutocompletion