Notes

This task does not have a hard dependency on task 224 listed below. To save time we could use any aquired data at hand and then we change the data origin for the embeddings generation once the solution that comes as a result from task 224 is ready.

Also check Preet's previous task:

Domain

app backend

Description

ChromaDB replaces AstraDB in our stack as the VectorStore where we save embeddings generated by the aquired data. Configure ChromaDB, and configure the chunking of our data documents into the appropriate size (distance etc.)

Some info about chunking in ChromaDB: https://datapipes.chromadb.dev/processors/chunking/#

Probably chunking can be taken over by some embeddings generating function of OpenAI or Gemini. Similarly there might be something coming from LangChain so you should do your research to figure out which way is the easiest for our use case.

User Story

"As a developer of this team, I want to replace AstraDB with ChromaDB for embedding storage and define chunk sizes and embedding functions, so that I can offer the end user I am serving access to custom context via the RAG technique eventually improving the reliability and traceability of the answers the user receives.

Acceptance Criteria

[ ] ChromaDB environment configuration has been completed and is proven to be functional.
[ ] Chunking parameters have been researched and implemented to suit the needs of our project.
[ ] Embeddings function(s) have been configured.
[ ] Embeddings have been generated (either from mock data or if task #224 has been completed, with acquired data).
[ ] Embeddings are stored with appropriate metadata in ChromaDB.

Definition of Done

[ ] The feature has been fully implemented.
[ ] The feature has been manually tested and works as expected without critical bugs.
[ ] The feature code is documented with clear explanations of its functionality and usage.
[ ] The feature code has been reviewed and approved by at least one team member.
[ ] The feature branches have been merged into the main branch and closed.
[ ] The feature utility, function and usage have been documented in the respective project wiki on github.

amosproj / amos2024ss06-health-ai-framework

Configure new VectorStore for embeddings #226