Open hanna-paasivirta opened 2 days ago
Thank you @hanna-paasivirta ! This is exciting. I'll take a close look and see if I have any suggestions, but I'm keen for this to be a rough first cut which we'll iterate on over the coming weeks.
We'll need to break up #111 into some smaller issues to represent next steps (eg: port the docs importer into this embedding service, investigate and setup PG vector, etc)
Short Description
Create an embedding service to access/create a embedding database and search it with an input text.
Fixes #111
Implementation Details
This module is primarily built for the Vocabulary Mapping project #109, but it can be used in any future projects that require embeddings. It leverages the LangChain library to allow for easy access to different types of embeddings and embedding storage services.
The main functions are:
create_vectorstore()
-> Create a new embedding database collection with textsget_existing_vectorstore()
-> Access an existing embedding database collectionget_similar_texts()
-> Use an input text to query the embedding collection for similar textsCurrently the module only allows Zilliz as a vector store and OpenAI Embeddings as a model (both require credentials). New options can be easily enabled.
AI Usage
Please disclose how you've used AI in this work (it's cool, we just want to know!):
You can read more details in our Responsible AI Policy