continuedev / continue

⏩ Continue is the leading open-source AI code assistant. You can connect any models and any context to build custom autocomplete and chat experiences inside VS Code and JetBrains
https://docs.continue.dev/
Apache License 2.0
18.27k stars 1.48k forks source link

Doc Indexing: embeddingsProvider.embed is not a function #2527

Open codekoriko opened 4 days ago

codekoriko commented 4 days ago

Before submitting your bug report

Relevant environment info

- OS: WSL Debian
- Continue version: v0.9.215 (pre-release)
- IDE version: Vscode 1.94.2 
- Model: 
- config.json:

{
  "embeddingsProvider": [
    {
      "provider": "openai",
      "model": "text-embedding-3-large",
      "apiKey": "[API_KEY]"
    }
  ]
}

### Description

when adding a doc 

{ "docs": [ { "title": "Pandas", "startUrl": "https://pandas.pydata.org/docs", "rootUrl": "https://pandas.pydata.org/docs", "faviconUrl": "https://pandas.pydata.org/docs/_static/favicon.ico" } ] }

In Developer Tools console I get `Error chunking article:  TypeError: embeddingsProvider.embed is not a function`

Is it the `openai` `embeddingsProvider` that is not compatible? Will it work with `Voyage AI` or `Ollama`?

### To reproduce

_No response_

### Log output

```Shell
[Extension Host] Crawl completed
console.ts:137 [Extension Host] Creating embeddings for 329 articles
329
console.ts:137 [Extension Host] Error chunking article:  TypeError: embeddingsProvider.embed is not a function
    at _DocsService.indexAndAdd (c:\Users\xxxxxx\.vscode\extensions\continue.continue-0.9.215-win32-x64\out\extension.js:448881:64)
    at indexAndAdd.next (<anonymous>)
    at _DocsService.syncConfigAndSqlite (c:\Users\xxxxxx\.vscode\extensions\continue.continue-0.9.215-win32-x64\out\extension.js:449009:36)
    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
    at async c:\Users\xxxxxx\.vscode\extensions\continue.continue-0.9.215-win32-x64\out\extension.js:448981:13
log.ts:439   ERR [Extension Host] No embeddings were created for site: https://pandas.pydata.org/docs
 Num chunks: 4009
console.ts:137 [Extension Host] No embeddings were created for site: https://pandas.pydata.org/docs
 Num chunks: 4009
codekoriko commented 4 days ago

apparently it also fails when indexing local files

[Extension Host] error when indexing:  Error: Failed to generate embeddings for 18 chunks with provider: undefined: TypeError: this.embeddingsProvider.embed is not a function
    at LanceDbIndex.getEmbeddings (c:\Users\xxxxxx\.vscode\extensions\continue.continue-0.9.215-win32-x64\out\extension.js:450142:17)
    at LanceDbIndex.computeRows (c:\Users\xxxxxx\.vscode\extensions\continue.continue-0.9.215-win32-x64\out\extension.js:450090:39)
    at LanceDbIndex.update (c:\Users\xxxxxx\.vscode\extensions\continue.continue-0.9.215-win32-x64\out\extension.js:450246:24)
    at CodebaseIndexer.indexFiles (c:\Users\xxxxxx\.vscode\extensions\continue.continue-0.9.215-win32-x64\out\extension.js:518245:39)
    at CodebaseIndexer.refresh (c:\Users\xxxxxx\.vscode\extensions\continue.continue-0.9.215-win32-x64\out\extension.js:518117:30)
    at Core.refreshCodebaseIndex (c:\Users\xxxxx\.vscode\extensions\continue.continue-0.9.215-win32-x64\out\extension.js:518973:26)

and some files don't chunk properly

[Extension Host] LanceDBIndex, skipping \home\xxxxxxxxx\dev\js\xxxxxxxxx\README.md: Error: did not chunk properly
checorone commented 2 days ago

Hi

I tracked this issue down to this line https://github.com/continuedev/continue/blob/main/core/config/load.ts#L409

Map returns undefined on any provider, guess that it expects provider name, but somehow the whole structure is passed If I hardcode provider name here (like 'openai') it starts working fine. Not sure how to fix it properly, I have no experience with TS at all)

So maybe someone with knowledge could take a look on this?