Load External Data to Vector Database

LucknowAI / Rag-LLM

Retrieval-Augmented Generation (RAG) using Large Language Models (LLMs)

3 stars 5 forks source link

Load External Data to Vector Database #2

Open ASahu16 opened 7 months ago

ASahu16 commented 7 months ago

Description: Implement functionality to load external data into the vector database. This involves developing scripts or tools to import data from various sources such as DOCX or PDF files and store them in the vector database.

Tasks:

Develop a script/tool to parse data from DOCX/PDF files.
Design a mechanism to transform the parsed data into vector representations.
Implement logic to store the vectorized data in the database.

aarushiksk commented 7 months ago

The steps that can be taken to solve this are:

Step 1) Parsing the PDF/DOCX using PyMuPDF(for text) or OCR(for images) or similar python libraries. Step2) Choosing an embedding model for converting this to embeddings. Step 3) Connecting to ChromaDB or FAISS using their APIs/Documentation

Assign this to me