OpenPecha / ocr_handwriting_aligner

tool to prepare ocr data of handwritings data
MIT License
0 stars 0 forks source link


OpenPecha

ocr_handwriting_aligner

Description

tool to prepare ocr data of handwritings data

Project owner(s)

Installation

pip install git+https://github.com/OpenPecha/ocr_handwriting_aligner.git

Usage

from ocr_handwriting_aligner.pipeline import pipeline

pdf_file_path = Path("P000015_v001_00001 - 00250.pdf")
transcript_file_path = Path("P000015_v001_transcript.csv")
image_orientation="Portrait"
acceptable_images = pipeline(pdf_file_path, transcript_file_path, image_orientation)
print(f"Number of acceptable line images: {len(acceptable_images)}")

Important Notes:

Outputs after running the above code:

standardize the csv


from ocr_handwriting_aligner.parse_transcript import standardize_line_texts_to_images_csv_mapping

csv_file_path = Path("line_image_mapping.csv")
batch_id = "P000015"
volume_id = "v001"
standardize_line_texts_to_images_csv_mapping(csv_file_path, batch_id, volume_id)

Output after running the above code: