Cinnamon / kotaemon

An open-source RAG-based tool for chatting with your documents.
https://cinnamon.github.io/kotaemon/
Apache License 2.0
17.49k stars 1.35k forks source link

[WIP] feat: add Voyage embeddings #408

Open fzliu opened 1 month ago

fzliu commented 1 month ago

Description

Adds embeddings from Voyage AI.

Type of change

Checklist

taprosoft commented 1 month ago

@fzliu thanks for your contribution. Please take a look at the CI reports and fix them. The rest looks okay to me.

ngduyanhece commented 4 weeks ago

After analyzing the pull request, here are my findings:

Overall Feedback:

The addition of the Voyage AI embeddings is a significant enhancement to the project, providing new functionality for embedding models. The code is generally well-structured, but there are areas where improvements can be made, particularly regarding error handling and code clarity.

Score: 85/100

Labels: Enhancement, Tests

Code Suggestions:

  1. File: libs/kotaemon/kotaemon/embeddings/voyageai.py

    • Suggestion Content: Ensure that the api_key is validated before using it to create the client.
    • Relevant Line: + self._client = _import_voyageai().Client(api_key=self.api_key)
    • Existing Code:
      self._client = _import_voyageai().Client(api_key=self.api_key)
    • Improved Code:
      if not self.api_key:
       raise ValueError("API key must be provided for VoyageAIEmbeddings.")
      self._client = _import_voyageai().Client(api_key=self.api_key)
  2. File: libs/kotaemon/kotaemon/embeddings/voyageai.py

    • Suggestion Content: Handle potential exceptions when calling the embed method to prevent crashes due to API issues.
    • Relevant Line: + embeddings = self._client.embed(texts, model=self.model_name).embeddings
    • Existing Code:
      embeddings = self._client.embed(texts, model=self.model_name).embeddings
    • Improved Code:
      try:
       embeddings = self._client.embed(texts, model=self.model_name).embeddings
      except Exception as e:
       raise RuntimeError(f"Failed to retrieve embeddings: {e}")
  3. File: libs/kotaemon/tests/test_embedding_models.py

    • Suggestion Content: Ensure that the test for VoyageAIEmbeddings checks for the correct output structure.
    • Relevant Line: + assert_embedding_result(output)
    • Existing Code:
      assert_embedding_result(output)
    • Improved Code:
      assert isinstance(output, list) and all(isinstance(doc, DocumentWithEmbedding) for doc in output)

The addition of Voyage AI embeddings is a great enhancement! Here are some suggestions:

  1. Validate the api_key before using it to create the client in VoyageAIEmbeddings.
  2. Handle exceptions when calling the embed method to prevent crashes due to API issues.
  3. Ensure that the test for VoyageAIEmbeddings checks for the correct output structure.