Open ASahu16 opened 7 months ago
The steps that can be taken to solve this are:
Step 1) Parsing the PDF/DOCX using PyMuPDF(for text) or OCR(for images) or similar python libraries. Step2) Choosing an embedding model for converting this to embeddings. Step 3) Connecting to ChromaDB or FAISS using their APIs/Documentation
Assign this to me
Description: Implement functionality to load external data into the vector database. This involves developing scripts or tools to import data from various sources such as DOCX or PDF files and store them in the vector database.
Tasks: