As a LeapfrogAI user
I want to be able to switch the embedding model used for my data at runtime So that I can experiment with different embedding models and find the one that works best for my specific needs.
Acceptance Criteria
Given that there are multiple embedding models available
When a user requests to switch the embedding model
Then the system should update the embedding model used for all subsequent operations.
Given that the embedding models have different vector sizes
When a user switches embedding models
Then the system should automatically adjust the vector storage format and indexing method to accommodate the new vector size.
Given that a user has previously used an embedding model to generate embeddings
When the user switches to a different embedding model
Then the system should re-embed existing data using the new model.
Describe alternatives you've considered
Hardcoding the embedding model: This would limit flexibility and require changes to the codebase every time a new model is added.
Using a separate table for each embedding model: This would be inefficient in terms of storage and retrieval.
Using a single table with a fixed-size vector column: This would require padding smaller vectors or truncating larger vectors, resulting in loss of information.
Additional context
We need to find a way to store vectors of different sizes efficiently in the database. This may involve using a combination of quantization, variable-length vectors, and multiple tables. We also need to consider the impact of switching embedding models on existing data and how to handle re-embedding.
User Story
As a LeapfrogAI user I want to be able to switch the embedding model used for my data at runtime
So that I can experiment with different embedding models and find the one that works best for my specific needs.
Acceptance Criteria
Given that there are multiple embedding models available When a user requests to switch the embedding model Then the system should update the embedding model used for all subsequent operations.
Given that the embedding models have different vector sizes When a user switches embedding models Then the system should automatically adjust the vector storage format and indexing method to accommodate the new vector size.
Given that a user has previously used an embedding model to generate embeddings When the user switches to a different embedding model Then the system should re-embed existing data using the new model.
Describe alternatives you've considered
Additional context
We need to find a way to store vectors of different sizes efficiently in the database. This may involve using a combination of quantization, variable-length vectors, and multiple tables. We also need to consider the impact of switching embedding models on existing data and how to handle re-embedding.