continuedev / continue

⏩ Continue is the leading open-source AI code assistant. You can connect any models and any context to build custom autocomplete and chat experiences inside VS Code and JetBrains
https://docs.continue.dev/
Apache License 2.0
13.31k stars 928 forks source link

Indexing fails with ollama:nomic-embed-text #1435

Open hoblin opened 1 month ago

hoblin commented 1 month ago

Before submitting your bug report

Relevant environment info

- OS: macOS 14.5
- Continue: 0.8.24 - 2024-04-12
- IDE: VSCode 1.90.0

Description

Indexing freezes at 25% for a while (popup displays "Completed indexing chunks) then fails with the error message:

Error updating the vectordb::nomic-embed-text index: Error: Invalid argument error: Values length 0 is less than the length (768) multiplied by the value size (768) for FixedSizeList(Field { name: "item", data_type: Float32, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }, 768)

Config:

{
  "models": [
    {
      "title": "Codestral",
      "provider": "ollama",
      "model": "codestral"
    },
    {
      "title": "Ollama",
      "provider": "ollama",
      "model": "AUTODETECT"
    }
  ],
  "customCommands": [
    {
      "name": "test",
      "prompt": "{{{ input }}}\n\nWrite a comprehensive set of unit tests for the selected code. It should setup, run tests that check for correctness including important edge cases, and teardown. Ensure that the tests are complete and sophisticated. Give the tests just as chat output, don't edit any file.",
      "description": "Write unit tests for highlighted code"
    }
  ],
  "tabAutocompleteModel": {
    "title": "Codestral",
    "provider": "ollama",
    "model": "codestral"
  },
  "allowAnonymousTelemetry": true,
  "embeddingsProvider": {
    "provider": "ollama",
    "model": "nomic-embed-text"
  }
}

To reproduce

  1. Install Codestral
  2. Pull the ollama models
  3. Shut down the VSCode
  4. Remove files except the config from ~/.continue directory for a fresh start
  5. Open the VSCode with code . from the project's directory

Log output

Error updating the vectordb::nomic-embed-text index: Error: Invalid argument error: Values length 0 is less than the length (768) multiplied by the value size (768) for FixedSizeList(Field { name: "item", data_type: Float32, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }, 768)
hoblin commented 1 month ago

I've got a similar error with mxbai-embed-large:

Error updating the vectordb::mxbai-embed-large index: Error: Invalid argument error: Values length 0 is less than the length (1024) multiplied by the value size (1024) for FixedSizeList(Field { name: "item", data_type: Float32, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }, 1024)
sestinj commented 1 month ago

@hoblin thanks for sharing this. I've seen it from others too, it looks like Ollama embeddings in general are failing. I'll have an update soon

Cheizr commented 1 month ago

Experiencing this also today, was working fine 1 hour ago.

image

rohhro commented 3 weeks ago

It's not just Ollama with nomic-embed-text fails, OpenAI embeddings also fail.

boshk0 commented 3 weeks ago

I can confirm it fails with nomic-embed-text too.

johnbwang commented 3 weeks ago

I think the issue is binary blobs are getting chunked with some chunks having zero length. Sending off the chunk to ollama then comes back with empty response.

I put up a PR that should fix the issue: #1493

sestinj commented 3 weeks ago

The PR looks to have solved it, but I'll wait for confirmation. This is going to be released in the next pre-release version (0.9.160)