Provider llama.cpp with deepseek-33b hitting wrong endpoint (/v1/completion instead of /v1/completions)

IonCaza commented 5 months ago

Before submitting your bug report

[X] I believe this is a bug. I'll try to join the Continue Discord for questions
[X] I'm not able to find an open issue that reports the same bug
[X] I've seen the troubleshooting guide on the Continue Docs

Relevant environment info

- OS: macOS 14.3 / deepseek hosted via llama-cpp-python.server
- Continue: 0.8.40
- IDE: VSCode 1.90.2

Description

see 404 above, Continue VSCode extension with the following model config:

{
  "title": "deepseek",
  "provider": "llama.cpp",
  "model": "deepseek-33b",
  "apiKey": "EMPTY",
  "apiBase": "http://hostname:8000/v1"
}

is hitting the wrong endpoint (completion instead of completions) - works fine from Swagger docs as you can see a successful request above the 404.

To reproduce

Host deepseek-33b with latest llama-cpp-python Configure per steps above Try sending a request, and you'll receive a 404.

Log output

No response

fry69 commented 5 months ago

From the short discussion in Discord it looks like this was solved by changing

"provider": "llama.cpp"

to

"provider": "openai"

I assume llama.cpp has become more OpenAI compatible over time.

fykyx521 commented 5 months ago

我的问题是左侧正常，　编辑器中404　 ` "tabAutocompleteModel": { "title": "deepseek-1b", "provider": "openai", "model": "deepseek-coder", "apiBase": "https://api.deepseek.com/v1", "apiKey": "sk-xxxxx", "template": "deepseek" }

` 404code 404code png

sestinj commented 5 months ago

@IonCaza fry69's response is correct. The "llama.cpp" provider was intended for a time when their built-in server wasn't OpenAI compatible. From the response you're getting there it seems that they are now fully OpenAI-compatible, but looking at their docs that isn't yet clear: https://github.com/ggerganov/llama.cpp/tree/master/examples/server#api-endpoints

If this becomes official we'll definitely just remove the llama.cpp provider, or make it a subclass of "openai"

sestinj commented 5 months ago

@fykyx521 DeepSeek's API does not support autocomplete, even though the model does. I know this is surprising and I'm trying to get in touch with them to add support. Until then, you can either run deepseek coder locally with Ollama, or you can use Codestral, which is currently the best autocomplete model available, and is free until August

Raboo commented 4 months ago

@fykyx521 DeepSeek's API does not support autocomplete, even though the model does. I know this is surprising and I'm trying to get in touch with them to add support. Until then, you can either run deepseek coder locally with Ollama, or you can use Codestral, which is currently the best autocomplete model available, and is free until August

They have added a configuration example that works. https://github.com/deepseek-ai/awesome-deepseek-integration/tree/main/docs/continue

{
  "tabAutocompleteModel": {
    "title": "DeepSeek-V2",
    "model": "deepseek-coder",
    "apiKey": "sk-xxx",
    "contextLength": 8192,
    "apiBase": "https://api.deepseek.com",
    "completionOptions": {
      "maxTokens": 4096,
      "temperature": 0,
      "topP": 1,
      "presencePenalty": 0,
      "frequencyPenalty": 0
    },
    "provider": "openai",
    "useLegacyCompletionsEndpoint": false
  },
  "tabAutocompleteOptions": {
    "useCache": true,
    "maxPromptTokens": 2048,
    "template": "Please teach me what I should write in the `hole` tag, but without any further explanation and code backticks, i.e., as if you are directly outputting to a code editor. It can be codes or comments or strings. Don't provide existing & repetitive codes. If the provided prefix and suffix contain incomplete code and statement, your response should be able to be directly concatenated to the provided prefix and suffix. Also note that I may tell you what I'd like to write inside comments. \n{{{prefix}}}<hole></hole>{{{suffix}}}\n\nPlease be aware of the environment the hole is placed, e.g., inside strings or comments or code blocks, and please don't wrap your response in ```. You should always provide non-empty output.\n"
  }
}

I noticed that I had to restart vscode for this to work. I think pretty much whenever I edit the tabAutocompleteModel I have to restart vscode.

continuedev / continue