Tech-Society-SEC / Chatbot_ML

0 stars 4 forks source link

Chatbot - ML

Welcome to the Chatbot - ML Repository!

This repository contains a machine learning project designed to process PDF documents, extract text, split it into smaller chunks, generate embeddings using Google’s Generative AI, and store them in a FAISS vector store for fast retrieval. The system enables question answering based on the contents of the document.

Features

Progress So Far

Core Functionality Implemented:

Recent Updates & Enhancements:

Code Functionality Explained

The project’s main functionalities are structured into several distinct components:

1. Extracting Text from PDFs

2. Splitting Text into Chunks

3. Generating Text Embeddings

4. Storing and Retrieving Embeddings

5. Question Answering

Installation

To set up the project, follow these steps:

  1. Clone the repository:

    git clone https://github.com/Tech-Society-SEC/Chatbot_ML.git
  2. Navigate to the project directory:

    cd Chatbot_ML
  3. Install the necessary libraries:

    pip install scikit-learn-intelex pymupdf langchain-google-genai langchain-community python-dotenv faiss-cpu
  4. Mount Google Drive (if needed):

    from google.colab import drive
    drive.mount('/content/drive')
  5. Configure the API Key:
    Create a .env file and store your Google API key:

    from dotenv import load_dotenv
    load_dotenv()
    api_key = os.getenv('GOOGLE_API_KEY')
  6. Optimize scikit-learn:

    from sklearnex import patch_sklearn
    patch_sklearn()

Beginner-Friendly Issues

We welcome contributions! Here are some beginner-friendly tasks:

Repository URL

For more details and to access the code, visit the GitHub Repository.