UB-Mannheim / ocr-gt-tools

Ergonomic line-by-line transcription of scanned text.
GNU Affero General Public License v3.0
47 stars 11 forks source link
ground-truth hocr ocr transcription web-interface

ocr-gt-tools

A web interface for creating ground truth for evaluating and training OCR.

Docker Stars Docker Pulls license label Travis GitHub stars

Table of Contents

Summary

ocr-gt-tools allows editing hOCR files, such as those produced by the tesseract or ocropy OCR frameworks.

Screenshot

Features

Installation

See INSTALL.md.

About the code

The server-side code is written in Perl.

The frontend is written in HTML and Javascript.

Usage

Contributing

Expand the wiki

We are using the wiki to collect transcription hints for unusual glyphs and frequent errors.

Pull Requests

Bug fixes, new functions, suggestions for new features and other user feedback are appreciated.

The source code is available from https://github.com/UB-Mannheim/ocr-gt-tools. Please prepare your code contributions also on Github.

Bug reports

Please feel free to open issues for any bug you encounter and features you'd like to have.

Acknowledgments

This is free software. You may use it under the terms of the GNU AFFERO General Public License (AGPL) version 3 or newer. See LICENSE for details.

This project bundles other free software: