Codebase retrieval mode doesn't properly answer using the context

logancyang commented 4 months ago

Before submitting your bug report

[X] I believe this is a bug. I'll try to join the Continue Discord for questions
[X] I'm not able to find an open issue that reports the same bug
[X] I've seen the troubleshooting guide on the Continue Docs

Relevant environment info

- OS: MacOS 14.3.1
- Continue: v0.8.17
- IDE: VS Code 1.87.1

Description

First time trying continue with @codebase, it seems successful at retrieval but the answer still appear to see nothing from the retrieved files.

I've been trying different questions with @Codebase but none worked. This one above was with GPT-4 and the default transformers.js embedding.

Really wanted it to work. (I was planning to make a YT video to share it because I think this is the current best way to integrate local coding LLMs)

I also tried using openai embedding small by setting the embedding provider as below and refreshing the index, but got this error Error getting context items from codebase: TypeError: Cannot read properties of undefined (reading '0'). Could be a separate issue though.

{
  "embeddingsProvider": {
    "provider": "openai",
    "model": "text-embedding-3-small"
  }
}

To reproduce

My config.json

{
  "models": [
    {
      "title": "GPT-4 (Free Trial)",
      "provider": "free-trial",
      "model": "gpt-4"
    },
    {
      "title": "GPT-4 Vision (Free Trial)",
      "provider": "free-trial",
      "model": "gpt-4-vision-preview"
    },
    {
      "title": "Gemini Pro (Free Trial)",
      "provider": "free-trial",
      "model": "gemini-pro"
    },
    {
      "title": "Codellama 70b (Free Trial)",
      "provider": "free-trial",
      "model": "codellama-70b"
    },
    {
      "title": "Phind Codellama",
      "provider": "ollama",
      "model": "phind-codellama",
      "num_ctx": 16384
    }
  ],
  "slashCommands": [
    {
      "name": "edit",
      "description": "Edit selected code"
    },
    {
      "name": "comment",
      "description": "Write comments for the selected code"
    },
    {
      "name": "share",
      "description": "Download and share this session"
    },
    {
      "name": "cmd",
      "description": "Generate a shell command"
    }
  ],
  "customCommands": [
    {
      "name": "test",
      "prompt": "Write a comprehensive set of unit tests for the selected code. It should setup, run tests that check for correctness including important edge cases, and teardown. Ensure that the tests are complete and sophisticated. Give the tests just as chat output, don't edit any file.",
      "description": "Write unit tests for highlighted code"
    }
  ],
  "contextProviders": [
    {
      "name": "diff",
      "params": {}
    },
    {
      "name": "open",
      "params": {}
    },
    {
      "name": "terminal",
      "params": {}
    },
    {
      "name": "problems",
      "params": {}
    },
    {
      "name": "codebase",
      "params": {
        "nRetrieve": 45,
        "nFinal": 15,
        "useReranking": true
      }
    }
  ],
  "embeddingsProvider": {
    "provider": "transformers.js"
  },
  "allowAnonymousTelemetry": false
}

Log output

No response

sestinj commented 3 months ago

@logancyang it looks like you're doing nothing wrong! Properties of undefined does seem like a separate error and I'll check this out.

But for the main error, I'm wondering if perhaps the aiState file just got pruned out of the context (this seems particularly likely since it's at the top of the list). A couple of things might help debug:

Check the "Output" window in VS Code (next to Terminal), go to the dropdown, and select "Continue - LLM Prompt/Completion". This will show the exact prompt sent to the LLM where you'd be able to verify that anything was/wasn't included
If it turns out that this is just getting cut out, then you can adjust the number of results returned to make sure that none of them are cut out: https://continue.dev/docs/walkthroughs/codebase-embeddings#nfinal

Assuming this is the case there are a few really obvious fixes I'll make on our end:

Set the default nFinal to something that would fit inside of the default GPT-4 context window
Fix the docs because they say default nFinal is 5. It's actually 10 right now
Return the results in reverse order because that top result should be the closest match, but is the first to be truncated

sealad886 commented 3 weeks ago

I'm seeing a very similar issue on my system, which is updated on almost all version since the original poster: MacOS Sonoma 14.5 Continue v0.9.155 IDE VSCode 1.90.0

Error text: `Error getting context items from code: TypeError: Cannot read properties of undefined (reading 'title').

I'm using a different embeddings provider:

  "embeddingsProvider": {
    "provider": "ollama",
    "model": "mxbai-embed-large",
    "apiBase": "http://localhost:11434"
  }

And here's my config file:

{
  "models": [
    {
      "title": "Granite-Code-I (Ollama)",
      "model": "sealad886/granite-code",
      "apiBase": "http://localhost:11434",
      "provider": "ollama",
      "contextLength": 16384,
      "completionOptions": {
        "temperature": 0.1,
        "topP": 0.93,
        "mirostat": 2,
        "keepAlive": 8,
        "stop": [
          "Question:",
          "<END EDITING HERE>"
        ]
      },
      "systemMessage": "You are a very helpful AI assistant. However, time is also a resource, so you know to keep your answers on-point and succinct unless asked for a 'complete' or 'whole' answer. I will let you know if I need more."
    },
    {
      "title": "Llama.cpp",
      "provider": "llama.cpp",
      "model": "codellama-34b",
      "apiBase": "http://localhost:8080",
      "contextLength": 8192,
      "completionOptions": {
        "temperature": 0.1,
        "topP": 0.95,
        "mirostat": 2
      },
      "systemMessage": "You are a very helpful AI assistant. However, time is also a resource, so you know to keep your answers on-point and succinct unless asked for a 'complete' or 'whole' answer. I will let you know if I need more."
    },
    {
      "title": "Phil Codellama",
      "model": "phind-codellama:34b-python-q4_K_M",
      "apiBase": "http://localhost:11434",
      "provider": "ollama"
    },
    {
      "title": "Deepseek-33b (Ollama)",
      "model": "sealad886/deepseek-33b",
      "apiBase": "http://localhost:11434",
      "provider": "ollama",
      "completionOptions": {
        "temperature": 0.1,
        "topP": 0.93,
        "mirostat": 2
      },
      "systemMessage": "You are a very helpful AI assistant. However, time is also a resource, so you know to keep your answers on-point and succinct unless asked for a 'complete' or 'whole' answer. I will let you know if I need more."
    },
    {
      "title": "Command-R (Ollama)",
      "model": "Command-R",
      "apiBase": "http://localhost:11434",
      "provider": "ollama",
      "contextLength": 32768,
      "completionOptions": {
        "temperature": 0.1,
        "topP": 0.93,
        "mirostat": 2
      },
      "systemMessage": "You are a very helpful AI assistant. However, time is also a resource, so you know to keep your answers on-point and succinct unless asked for a 'complete' or 'whole' answer. I will let you know if I need more."
    },
    {
      "title": "Command-R-Plus (ollama)",
      "model": "command-r-plus",
      "apiBase": "http://localhost:11434",
      "provider": "ollama",
      "contextLength": 32768,
      "completionOptions": {
        "temperature": 0.1,
        "topP": 0.93,
        "mirostat": 2
      },
      "systemMessage": "You are a very helpful AI assistant. However, time is also a resource, so you know to keep your answers on-point and succinct unless asked for a 'complete' or 'whole' answer. I will let you know if I need more."
    },
    {
      "title": "Llama-3 (ollama)",
      "model": "sealad886/llama3",
      "apiBase": "http://localhost:11434",
      "provider": "ollama",
      "contextLength": 32768,
      "completionOptions": {
        "temperature": 0.1,
        "topP": 0.93,
        "mirostat": 2
      },
      "systemMessage": "You are a very helpful AI assistant. However, time is also a resource, so you know to keep your answers on-point and succinct unless asked for a 'complete' or 'whole' answer. I will let you know if I need more."
    }
  ],
  "slashCommands": [
    {
      "name": "edit",
      "description": "Edit selected code"
    },
    {
      "name": "comment",
      "description": "Write comments for the selected code"
    },
    {
      "name": "share",
      "description": "Export this session as markdown"
    },
    {
      "name": "cmd",
      "description": "Generate a shell command"
    }
  ],
  "customCommands": [
    {
      "name": "test",
      "prompt": "Write a comprehensive set of unit tests for the selected code using the `pytest` library. It should setup, run tests that check for correctness including important edge cases, and teardown. Ensure that the tests are complete and sophisticated. Give the tests just as chat output, don't edit any file.",
      "description": "Write Python unit tests for highlighted code"
    },
    {
      "name": "check",
      "description": "Check for mistakes in my code",
      "prompt": "{{{ input }}}\n\nPlease read the highlighted code and check for any mistakes. You should look for the following, and be extremely vigilant:\n- Syntax errors\n- Logic errors\n- Security vulnerabilities\n- Performance issues\n- Anything else that looks wrong\n\nOnce you find an error, please explain it as clearly as possible, but without using extra words. For example, instead of saying 'I think there is a syntax error on line 5', you should say 'Syntax error on line 5'. Give your answer as one bullet point per mistake found. Assume that libraries are imported elsewhere correctly. Assume that non-library function calls are correct."
    },
    {
      "name": "docstring",
      "description": "Write a doctring for selected function(s)",
      "prompt": "{{{ input }}}\n\nWrite a concise docstring for the highlighted function or functions. Do not edit any of the code itself, including existing comments. If a docstring already exists, optimize it to conform to accepted formatting standards and add missing content. Return only the docstring, excluding the function definition line, with correct indentation to be copy-and-pasted directly into the script. Surround the block of text with '```' and also triple single-quotes, both front and back each, to identify it as code-like text. And example is:\n\n```\n    '''\n    This is a docfile example.\n\n    Parameters:...\n    '''\n```"
    },
    {
      "name": "finish",
      "description": "Finish the missing code starting at the cursor position.",
      "prompt": "{{{ input }}}\n\nFinish the rest of this function and any other function that is required at this time. Do not include any import statements or any used function calls that have already been defined here or in other files or libraries. If you are not sure what to do, just finish the function and return."
    }
  ],
  "contextProviders": [
    {
      "name": "code",
      "params": {}
    },
    {
      "name": "docs",
      "params": {}
    },
    {
      "name": "diff",
      "params": {}
    },
    {
      "name": "open",
      "params": {}
    },
    {
      "name": "terminal",
      "params": {}
    },
    {
      "name": "problems",
      "params": {}
    },
    {
      "name": "folder",
      "params": {}
    },
    {
      "name": "codebase",
      "params": {
        "nRetrieve": 50,
        "nFinal": 10,
        "useReranking": true
      }
    },
    {
      "name": "url",
      "params": {}
    },
    {
      "name": "tree"
    },
    {
      "name": "locals",
      "params": {
        "stackDepth": 3
      }
    }
  ],
  "tabAutocompleteModel": {
    "title": "deepseek-coder",
    "provider": "ollama",
    "model": "deepseek-coder:6.7b-base-q4_K_M",
    "contextLength": 8192,
    "systemMessage": "Pay close attention to the author's coding style and to where similar code might have been written before. Make sure you're using the same programming language used in surrounding code context.",
    "completionOptions": {
      "temperature": 0.1
    }
  },
  "tabAutocompleteOptions": {
    "useCopyBuffer": true,
    "useCache": true,
    "multilineCompletions": "never",
    "debounceDelay": 15
  },
  "allowAnonymousTelemetry": true,
  "embeddingsProvider": {
    "provider": "ollama",
    "model": "mxbai-embed-large",
    "apiBase": "http://localhost:11434"
  }
}

In this specific moment, I was trying to use the @Code context provider to link in a specific function. Other context providers using the same configuration and the same workspace do not produce this error. However, I will note that I have seen this error at other times (e.g. using @Codebase).

@sestinj I think your initial intuition that something was being truncated due to context length limits fits with the behavior I'm seeing here as well. Happy to help out if this is still in the early stages of investigation.

continuedev / continue