AHR-OCR2024 / Arabic-Handwriting-Recognition

This Repository will be used on short notice to revise what we did so far on our project, we will group all the codes and data used in this repository
https://digital-dahd.vercel.app/
4 stars 0 forks source link

Digital-Ḍād

Arabic Handwriting Recognition 🖋️

Welcome to the Arabic Handwriting Recognition project! This repository consolidates all the code and data used in our endeavor to create an effective OCR (Optical Character Recognition) system for Arabic script.

This project is proudly made by:

Names
Ahmed Taha Ayman Saber
Ahmed Nagah Abanoub Aied
Kerollos Samir Mohamed Abdelfattah
Mohamed Fathi Nada Asran
Nada Mahmoud Rawan Gamal
Reem Fouad

Table of Contents

Introduction

Arabic Handwriting Recognition is a project aimed at developing a robust system to accurately recognize and digitize Arabic handwritten text. This repository groups together all the scripts, models, and datasets used throughout the development process.

Features

Installation

To get started, clone the repository and install the required dependencies:

git clone https://github.com/AHR-OCR2024/Arabic-Handwriting-Recognition.git
cd Arabic-Handwriting-Recognition/Application
pip install -r requirements.txt
npm i

Download the pretrained models from here , Put them in Application/Backend/Models

Usage

To use the OCR system, follow these steps:

  1. Prepare the data: Ensure your data is in the correct format.
  2. Train the model: Use the provided training scripts to train the OCR model.
  3. Evaluate: Test the model on a validation set to check its accuracy.
  4. Run predictions: Use the trained model to recognize text from new handwritten samples.

To Run the application, run each of these commands in a separate terminial

python ./Backend/Backend.py
npm run dev

Data

The dataset used in this project is included in the repository. It contains a variety of Arabic handwritten samples to train and evaluate the model. To use the dataset:

  1. Download the Data.rar file.
  2. Extract the contents to the appropriate directory.

Preprocessing

Preprocessing is a critical stage in the development of our Arabic handwriting recognition system. The goal is to enhance the quality of the input data to ensure accurate recognition. Here are the main steps involved in our preprocessing pipeline:

  1. Image Acquisition: We collect images of handwritten Arabic text from various sources, including scanned documents and photos taken by digital cameras.

    Acquired Image

  2. Geometric Correction: We correct distortions and warping in the images. Techniques like Hough Line Transform and DocTr (Document Image Transformer) are used to straighten the text lines.

  3. Noise Removal: We apply filters to remove noise and enhance the clarity of the text. This includes techniques like Gaussian blur and median filtering.

    Noise Removal
    Unwarped and filtered image

  4. Segmentation: We segment the images into paragraphs, lines, and individual characters. This involves methods like histogram projection and CRAFT (Character Region Awareness for Text Detection).

    image
    Segmented text using CRAFT

  5. Normalization: We normalize the images to a fixed size (64x64) and rescale the pixel values to the range [0, 1] by dividing by 255.0.

    Final Results
    Final Results

AI Training

The AI training phase involves developing and training deep learning models to recognize and digitize handwritten Arabic text. Here are the key components of our AI training process:

  1. Prototype Model Experimentation We experimented with three different architectures on a small portion of the data (15,477 Samples) at first to identify the most effective model for our Arabic handwriting recognition system. The comparison of the results is shown in the table below:
Architecture CER Accuracy
EfficientNet-B1 7.3% 92.7%
VGG19 5.4% 94.6%
ResNet152 2.96% 97.04%

ResNet152
ResNet152 Performance Throughout the Epochs

  1. Dataset Preparation We use a combination of the Arabic Alphabet Character dataset and the KHATT dataset. The combined final dataset includes 108,619 samples.

  2. Data Augmentation To improve the robustness of our model, we apply various data augmentation techniques, such as rotation, translation, and scaling.

  3. Model Architecture We utilize the ResNet50V2 model, pre-trained on the Arabic Alphabet Character dataset. We then continue training on the KHATT dataset using advanced techniques.

ResNet50V2Alphabet
ResNet50V2 Performance on Alphabet Dataset

  1. Training Techniques

    • Optimizer: We use the Adam optimizer with specific parameters for efficient training.
    • Learning Rate Scheduler: A cosine learning rate scheduler is employed to adjust the learning rate dynamically during training.
    • Training Duration: The model is trained across 70 epochs to ensure convergence and optimal performance.
  2. Evaluation Metrics We use Character Error Rate (CER) and accuracy as our primary evaluation metrics. Our final model achieved a CER of 3% and an accuracy of 97% on the test set.

ResNet50V2
ResNet50V2 (Pre-Trained on Alphabet) Performance Throughout the Epochs

Full System Diagram

Diagram
The Flow of Our System

Digital-Ḍād ض-الرقمية

Our application provides a variety of services and models:

MainPage
Main Page of Our Application

  1. Handwriting OCR: Utilizing our model simply taking an image of a paragraph written by handwriting, preprocessing the paragraph, and finally performing ocr on the resulting segmented words or sub-words

OCRModel
Handwriting OCR Model

  1. Exam Grading: We pass questiong with their specific answers to the model, and utilizing our OCR methodology alongside a LLM with api, we scan the answers written by a student giving it a grade.

ExamGrader
Exam Grader Model

  1. Document Scanner: Simply scanning an image of a document written by handwriting, performing OCR on it and then assembling the resulting text using a LLM, creating a full digitized version of the document.

Contributing

Contributions are welcome! Please follow these steps to contribute:

  1. Fork the repository.
  2. Create a new branch (git checkout -b feature/your-feature).
  3. Commit your changes (git commit -m 'Add some feature').
  4. Push to the branch (git push origin feature/your-feature).
  5. Open a pull request.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgements

We would like to thank all contributors and the community for their support and feedback. Special thanks to the authors of the datasets and tools used in this project.

Feel free to reach out with any questions or feedback. Let's make Arabic handwriting recognition more accessible and accurate together!

🌟 Happy Coding! 🌟

🚧 This repo is still incomplete and is under construction 🚧