continuedev / continue

⏩ Continue is the leading open-source AI code assistant. You can connect any models and any context to build custom autocomplete and chat experiences inside VS Code and JetBrains
https://docs.continue.dev/
Apache License 2.0
19.28k stars 1.67k forks source link

Llama.cpp hosted Condestral-22B not getting correct templates. #1418

Open CambridgeComputing opened 5 months ago

CambridgeComputing commented 5 months ago

Before submitting your bug report

Relevant environment info

- OS: Windows 11 Pro
- Continue: v0.9.150 (pre-release)
- IDE: VSCode v1.89.1

Description

I have the following model definition for Codestral-22B running locally on llama.cpp server:

    {
      "title": "Codestral",
      "model": "Codestral-22B",
      "contextLength": 16384,
      "completionOptions": {},
      "apiBase": "http://localhost:8080",
      "provider": "llama.cpp",
      "template": "llama2"
    }

Code editing works, but not chat. In order for regular chat to work, I have to specify the llama2 template. Without it, I get the following error popup when using chat.

Error: You must either implement templateMessages or _streamChat

When I do specify the template as llama2, code editing (Ctrl+i) for highlighting code and requesting changes no longer works and give this different error:

Error streaming diff: TypeError: templateMessages is not a function

Looking through the code in autodetect.ts, it appears that the edit template is supposed to be osModelsEditPrompt, but the else if catches it earlier and assigns it llama2.

I can't make sense of the errors I got, but either moving

  } else if (model.includes("codestral")) {
    editTemplate = osModelsEditPrompt;
  }

...up before } else if (templateType === "llama2") { (line 282) or changing the autodetect logic would "make" it work, but I'm not sure if that fixes the root issue.

Edit - partial solution

I have found the following partial solution that has gotten me up and running, but is a bit of a pain since I'm swapping models often for testing and benchmarking. To fix, I've added a second model definition with no template that will be used for editing, and added modelRoles with inlineEdit defined to point to it. My new config:

{
  "models": [
    {
      "title": "Codestral",
      "model": "Codestral-22B",
      "contextLength": 16384,
      "completionOptions": {},
      "apiBase": "http://localhost:8080",
      "provider": "llama.cpp",
      "template": "llama2"
    },
    {
      "title": "Codestral - Edit",
      "model": "Codestral-22B",
      "contextLength": 16384,
      "completionOptions": {},
      "apiBase": "http://localhost:8080",
      "provider": "llama.cpp"
    }
  ],
  "disableSessionTitles": true,
  "experimental": {
    "modelRoles": {
      "inlineEdit": "Codestral - Edit"
    }
  }
}

If there is a better way to do this, please let me know. Thanks!

sestinj commented 5 months ago

@CambridgeComputing thanks for sharing this. Looks like our template definitions got a bit twisted because of the different contexts in which codestral is used. It's sort of a quick fix, but to avoid playing a game of whack-a-mole and because you have a temp solution I'm going to take an extra second and try to do this correctly (need a little refactor/cleaning)