chroma-core / chroma

the AI-native open-source embedding database
https://www.trychroma.com/
Apache License 2.0
13.36k stars 1.14k forks source link

Add LlamaCppEmbeddingFunction class for document embedding #2410

Open AveryUALibrary opened 4 days ago

AveryUALibrary commented 4 days ago

Description of changes

Implementation of Issue 2409

Summarize the changes made by this PR.

Test plan

How are these changes tested?

Documentation Changes

Are all docstrings for user-facing APIs updated if required? Do we need to make documentation changes in the docs repository?

github-actions[bot] commented 4 days ago

Reviewer Checklist

Please leverage this checklist to ensure your code review is thorough before approving

Testing, Bugs, Errors, Logs, Documentation

tazarov commented 4 days ago

@AveryUALibrary, thanks for this. Here are a few more things we need to get done (happy to add any/all of those to your fork):

AveryUALibrary commented 4 days ago

@AveryUALibrary, thanks for this. Here are a few more things we need to get done (happy to add any/all of those to your fork):

  • Docs
  • Tests
  • Utility to fetch models - llama.CPP uses GGUF format, if I'm not mistaken; for most parts, these files are also hosted on HF, so maybe we can either add instructions but, better yet, use either direct HTTP calls to fetch the models or use HF tooling for that (if applicable).

The current implementation now contains updated docs and automatic fetching of hugging face models using Llama.from_pretrained. No clue on how to implement testing for Chroma.