Indexing fails with ollama:nomic-embed-text

hoblin commented 1 month ago

Before submitting your bug report

[X] I believe this is a bug. I'll try to join the Continue Discord for questions
[X] I'm not able to find an open issue that reports the same bug
[X] I've seen the troubleshooting guide on the Continue Docs

Relevant environment info

- OS: macOS 14.5
- Continue: 0.8.24 - 2024-04-12
- IDE: VSCode 1.90.0

Description

Indexing freezes at 25% for a while (popup displays "Completed indexing chunks) then fails with the error message:

Error updating the vectordb::nomic-embed-text index: Error: Invalid argument error: Values length 0 is less than the length (768) multiplied by the value size (768) for FixedSizeList(Field { name: "item", data_type: Float32, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }, 768)

Config:

{
  "models": [
    {
      "title": "Codestral",
      "provider": "ollama",
      "model": "codestral"
    },
    {
      "title": "Ollama",
      "provider": "ollama",
      "model": "AUTODETECT"
    }
  ],
  "customCommands": [
    {
      "name": "test",
      "prompt": "{{{ input }}}\n\nWrite a comprehensive set of unit tests for the selected code. It should setup, run tests that check for correctness including important edge cases, and teardown. Ensure that the tests are complete and sophisticated. Give the tests just as chat output, don't edit any file.",
      "description": "Write unit tests for highlighted code"
    }
  ],
  "tabAutocompleteModel": {
    "title": "Codestral",
    "provider": "ollama",
    "model": "codestral"
  },
  "allowAnonymousTelemetry": true,
  "embeddingsProvider": {
    "provider": "ollama",
    "model": "nomic-embed-text"
  }
}

To reproduce

Install Codestral
Pull the ollama models
Shut down the VSCode
Remove files except the config from ~/.continue directory for a fresh start
Open the VSCode with code . from the project's directory

Log output

Error updating the vectordb::nomic-embed-text index: Error: Invalid argument error: Values length 0 is less than the length (768) multiplied by the value size (768) for FixedSizeList(Field { name: "item", data_type: Float32, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }, 768)

hoblin commented 1 month ago

I've got a similar error with mxbai-embed-large:

Error updating the vectordb::mxbai-embed-large index: Error: Invalid argument error: Values length 0 is less than the length (1024) multiplied by the value size (1024) for FixedSizeList(Field { name: "item", data_type: Float32, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }, 1024)

sestinj commented 1 month ago

@hoblin thanks for sharing this. I've seen it from others too, it looks like Ollama embeddings in general are failing. I'll have an update soon

Cheizr commented 1 month ago

Experiencing this also today, was working fine 1 hour ago.

rohhro commented 3 weeks ago

It's not just Ollama with nomic-embed-text fails, OpenAI embeddings also fail.

boshk0 commented 3 weeks ago

I can confirm it fails with nomic-embed-text too.

johnbwang commented 3 weeks ago

I think the issue is binary blobs are getting chunked with some chunks having zero length. Sending off the chunk to ollama then comes back with empty response.

I put up a PR that should fix the issue: #1493

sestinj commented 3 weeks ago

The PR looks to have solved it, but I'll wait for confirmation. This is going to be released in the next pre-release version (0.9.160)

continuedev / continue