continuedev / continue

⏩ Continue is the leading open-source AI code assistant. You can connect any models and any context to build custom autocomplete and chat experiences inside VS Code and JetBrains
https://docs.continue.dev/
Apache License 2.0
17.62k stars 1.36k forks source link

Long edits are getting cut off / finish early #2319

Open Snowman-25 opened 2 weeks ago

Snowman-25 commented 2 weeks ago

Before submitting your bug report

Relevant environment info

- OS:       Windows 11 23H2 (Build 22361.4169)
- Continue: v0.8.52
- IDE:      VSCode 1.92.1
- Model:    deepseek-coder-v2:latest
- config.json:

{
  "models": [
    {
      "title": "Ollama",
      "provider": "ollama",
      "model": "AUTODETECT"
    }
  ],
  "customCommands": [
    {
      "name": "test",
      "prompt": "{{{ input }}}\n\nWrite a comprehensive set of unit tests for the selected code. It should setup, run tests that check for correctness including important edge cases, and teardown. Ensure that the tests are complete and sophisticated. Give the tests just as chat output, don't edit any file.",
      "description": "Write unit tests for highlighted code"
    },
    {
      "name": "desc",
      "prompt": "{{{ input }}}\n\nWrite a comprehensive comment for each block of the selected code. It should describe what it does and show possible caveats.",
      "description": "Comment the highlighted code"
    }
  ],
  "tabAutocompleteModel": {
    "title": "Starcoder 3b",
    "provider": "ollama",
    "model": "starcoder2"
  },
  "allowAnonymousTelemetry": false,
  "embeddingsProvider": {
    "provider": "transformers.js"
  },
  "contextProviders": [
    {
      "name": "open",
      "params": {
        "onlyPinned": false
      }
     }
  ],
  "experimental": {
    "readResponseTTS": true
  },
  "ui": {
    "showChatScrollbar": true
  },
  "docs": []
}

Description

While trying to convert a big Python2 file to Python3, output always stops after ~103 ± 3 lines. It sometimes cuts off in the middle of a variable or function name. The source-script is almost 600 lines. The Ollama-log doesn't show any errors. It behaves as if the answer is complete after 2-3 Minutes:

[GIN] 2024/09/19 - 15:00:08 | 200 |         2m40s |       10.8.0.21 | POST     "/api/generate"
[GIN] 2024/09/19 - 15:03:41 | 200 |         2m18s |       10.8.0.21 | POST     "/api/generate"

I'm not sure if this is a timeout-issue or token exhaustion or something else entirely. The Model has a context length of 163840 and an embedding length of 2048

When using starcoder2 (16384 context length and 3072 embedding length), I get ~150 lines.

To reproduce

  1. Find a big Python 2 script and open it in VS-Code
  2. Select All, press Ctrl+I
  3. Enter "Convert this Python2 program to Python3"

Log output

No response

Patrick-Erichsen commented 2 weeks ago

Hi @Snowman-25 , thanks for the detailed writeup here. Interesting that the line cutoff is so consistent.

Nothing obvious comes to mind for me here, @sestinj , any thoughts?

Haripritamreddy commented 1 week ago

Same issue I used gemini flash and pro models both got cutoff at a similar area.

sestinj commented 1 week ago

You can try setting completionOptions.maxTokens in your config.json. We recently increased the default value for this, and I believe that the change is still in pre-release

btebbutt commented 1 week ago

You can try setting completionOptions.maxTokens in your config.json. We recently increased the default value for this, and I believe that the change is still in pre-release

yep this fixed it for me . I was previously adding "maxTokens": 4096, which wasn't working but now adding :

"completionOptions": { "maxTokens": 4096 },

fixes it for me .

KrisF-Midnight commented 1 week ago

That fixed it for me too. Note the plural in tokens.

Snowman-25 commented 1 week ago

Any specific reason why maxTokens isn't equal to the context length of the selected model? I may be confusing words here though, not an AI model expert