bangoc123 / drop-rag

17 stars 3 forks source link

Drag and Drop RAG

Overview

Video

This project is a Retrieval-Augmented Generation (RAG) pipeline that enables users to upload data (CSV, JSON, PDF, or DOCX files), store it in a Chroma vector store, and interact with it through a chatbot powered by Gemini (version gemini-1.5-pro). The chatbot retrieves relevant data from the uploaded files, enhances user queries, and returns responses using LLMs.

Features

  1. Upload CSV, JSON, PDF, or DOCX files – Supports multiple file types and allows users to select columns for vector search.
  2. Store and retrieve vector embeddings using Chroma.
  3. Interactive chatbot using the Gemini API to generate responses based on user queries and stored data.
  4. Customizable LLMs – Select columns from which the LLM should answer queries.

Running the Application

  1. Clone the repository to your local machine:

    git clone https://github.com/bangoc123/drop-rag.git
    cd drop-rag
  2. Install the required Python packages:

    pip install -r requirements.txt
  3. Run the Streamlit app:

    streamlit run app.py
  4. Access the application on http://localhost:8501 in your browser.

Steps to Use:

  1. Upload Data: Upload a CSV, JSON, PDF, or DOCX file. Select the column to be indexed for vector search.
  2. Save Data: The file is saved in the Chroma vector store with vector embeddings generated by the all-MiniLM-L6-v2 or keepitreal/vietnamese-sbert model.
  3. Setup LLMs: Enter your Gemini API key to configure the chatbot for generating responses. Get the key here.
  4. Chat: Start interacting with the chatbot, which retrieves and augments responses using the data from the uploaded file.

Notes:

Troubleshooting:


This reflects the latest update, including the correct repository link.