continuedev / continue

⏩ Continue is the leading open-source AI code assistant. You can connect any models and any context to build custom autocomplete and chat experiences inside VS Code and JetBrains
https://docs.continue.dev/
Apache License 2.0
17.33k stars 1.34k forks source link

Feature: Different embedding models for VectorDB Creation and queries. #2198

Open adarshdotexe opened 3 weeks ago

adarshdotexe commented 3 weeks ago

Validations

Problem

Currently the nemo-retriever architecture proposes a method to generate different embeddings based on if the input type is a passage (during vectorDB creation) or a query (during RAG using preexisting vectorDB). This currently is not supposed by the BaseEmbeddingProvider.

Reference: https://docs.nvidia.com/nim/nemo-retriever/text-embedding/latest/overview.html

This is a standard practice in most embedding models, langchain implements two functions for example, one for documents (passage) and other for queries.

Reference: https://python.langchain.com/v0.1/docs/modules/data_connection/text_embedding/

WAR to use nvidia nim for embedding generation

The following can be added to ~/.continue/config.ts

export function modifyConfig(config: Config): Config {
  config.embeddingsProvider = {
    id: 'nvidia-embeddings-provider',
    providerName: 'openai',
    maxChunkSize: 2048,
    embed: async (chunks: string[]) => {
      if (chunks.length === 0) {
        console.log('No chunks to embed');
        return []; // or throw an error, depending on your requirements
      }

      const apiKey = '<YOUR API KEY>';
      const url = 'https://integrate.api.nvidia.com/v1/embeddings/';

      const data = JSON.stringify({
        input: chunks,
        input_type: 'query',
        model: 'nvidia/nv-embedqa-mistral-7b-v2',
      });
      const options = {
        method: 'POST',
        headers: {
          'Authorization': `Bearer ${apiKey}`,
          'Content-Type': 'application/json',
          'api-key': apiKey,
        },
        body: data,
      };

      try {
        const response = await fetch(url, options);
        const responseData = await response.json();
        const embeddings = responseData.data.map((item) => item.embedding);
        console.log(embeddings)
        return embeddings;
      } catch (error) {
        console.error('Error:', error);
        throw error;
      }
    },
  };

  return config;
}
    input_type: 'query',

The above line ensures that all embeddings are of type query which leads to suboptimal performance.

Solution

Distinctions need to be made whenever the embeddings provider's embed function is called to differentiate between a passage and a query call for embeddings generation.

mattf commented 3 weeks ago

@adarshdotexe another workaround is to use the baai/bge-m3 model, which is symmetric and does not require the input_type=passage/query prompting

  "embeddingsProvider": {
    "provider": "openai",
    "model": "baai/bge-m3",
    "apiBase": "https://integrate.api.nvidia.com/v1",
    "apiKey": "nvapi-..."
  }