PStarH / LLM-boost-recognition

OCR and Voice Recognition Module: Effortlessly convert documents and audio into actionable text using advanced OCR engines and voice recognition technologies, featuring LLM correction and GPU acceleration—perfect for processing all kinds of hard data like math formula!
GNU General Public License v3.0
46 stars 7 forks source link

OCR and Voice Recognition Module

Build Status License

Description

The OCR and Voice Recognition Module is a comprehensive tool designed to extract and process text from PDF documents, images, and audio files. Leveraging multiple OCR engines and advanced voice recognition technologies, this module ensures high accuracy and includes features such as error correction using Language Models (LLMs), math formula processing, and document structure identification. Highly configurable and supporting GPU acceleration, it caters to a wide range of applications from document digitization to voice-controlled systems.

Table of Contents

Installation

Prerequisites

Steps

  1. Clone the Repository

    git clone https://github.com/PStarH/ocr-voice-recognition-module.git
    cd ocr-voice-recognition-module
  2. Create a Virtual Environment

    python3 -m venv venv
    source venv/bin/activate
  3. Install Dependencies

    pip install -r requirements.txt
  4. Install Tesseract OCR

    • Ubuntu
      sudo apt-get update
      sudo apt-get install tesseract-ocr
    • macOS
      brew install tesseract
    • Windows
      • Download the installer from Tesseract OCR and follow the installation instructions.
  5. Download Additional Models Ensure that the required models for EAST, CRAFT, and LLMs are downloaded and placed in the appropriate directories as specified in the configuration.

  6. Configure Environment Variables Create a .env file in the root directory with the following structure:

    USE_LOCAL_LLM=True
    API_PROVIDER=OLLAMA
    OLLAMA_API_URL=http://localhost:11434
    OLLAMA_MODEL_NAME=ggml-gpt4all-j-v1.3-groovy
    CLAUDE_MODEL_STRING=claude-3-haiku-20240307
    MATH_OCR_API_KEY=your_math_ocr_api_key
    MATH_OCR_ENDPOINT=your_math_ocr_endpoint
    LLM_ERROR_CORRECTION_MODEL=Llama-3.1-8B-Lexi-Uncensored_Q5_fixedrope.gguf
    LLM_LAYOUT_MODEL=Llama-3.1-8B-Lexi-Uncensored_Q5_fixedrope.gguf
    PREPROCESSING_ENABLED=True
    PROGRESS_TRACKING_ENABLED=True
    OCR_ENGINE=pytesseract
    PADDLEOCR_ENABLED=True
    PADDLEOCR_LANGUAGE=en
    PADDLEOCR_USE_GPU=False
    TEXT_DETECTION_MODEL=EAST
    TEXT_DETECTION_THRESHOLD=0.5

    Note: Replace placeholder values with your actual configuration details.

Usage

Run the OCR and Voice Recognition Workflow

python OCR.py

Parameters

Example

input_pdf_file_path = 'path/to/your/document.pdf'
max_test_pages = 0 # Set to 0 to process all pages
skip_first_n_pages = 0 # Set to skip initial pages if needed
reformat_as_markdown = True
suppress_headers_and_page_numbers = True

Voice Recognition Usage

python Voice-Recognition.py

Configure the input audio file path and other settings in the main function as needed.

Features

Contributing

Contributions are welcome! Please follow these steps:

  1. Fork the Repository
  2. Create a Feature Branch
    git checkout -b feature/YourFeature
  3. Commit Your Changes
    git commit -m "Add your feature"
  4. Push to the Branch
    git push origin feature/YourFeature
  5. Open a Pull Request

Please ensure that your code follows the project's coding standards and includes appropriate documentation.

Code of Conduct

Please read and follow our Code of Conduct to ensure a welcoming and respectful environment for all contributors.

License

This project is licensed under the GPL-3.0 License.

Acknowledgements

FAQs

1. How do I switch between different OCR engines?

Update the OCR_ENGINE variable in your .env file to pytesseract, easyocr, or paddleocr based on your preference.

2. Can I use this module without GPU?

Yes, the module is fully functional on CPU. However, GPU acceleration is available and recommended for faster processing if your system supports it.

3. How do I add support for additional languages?

Ensure that the required language packs are installed for your chosen OCR engines and update the SUPPORTED_LANGUAGES configuration in the .env file.

4. What should I do if I encounter an error during installation?

Check the error logs for specific issues, ensure all prerequisites are met, and verify that all dependencies are correctly installed. Feel free to open an issue on the repository for further assistance.

5. Is there a way to contribute feedback on OCR accuracy?

Yes, the module includes a feedback mechanism. Refer to the collect_user_feedback function in the code for details on how to provide feedback.

Contact

For support or inquiries, please reach out via GitHub Issues.

Roadmap

Changelog

v1.0.0

v1.1.0

v1.2.0