CopticScriptorium / OCR

repository for development files and documentation for Coptic OCR in development by Coptic Scriptorium
GNU General Public License v3.0
1 stars 0 forks source link

OCR

Repository for development files and documentation for Coptic OCR in development by Coptic Scriptorium.

Much of this work is based on prior work from other researchers that was shared publicly. Please credit the prior researchers along with Coptic Scriptorium.

OCR4All files

Models and training files for OCR4All were developed from original work by Eliese-Sophia Lincke, et al.. The OCR4All team converted the training files and produced a model for the newer version of OCR4All and provided them to Coptic Scriptorium.

Citations: Eliese-Sophia Lincke, Kirill Bulert & Marco Büchler, "Optical Character Recognition for Coptic fonts: A multi-source approach for scholarly editions," in: DATeCH2019 Proceedings of the 3rd International Conference on Digital Access to Textual Cultural Heritage, 87-91. open access; DOI: 10.1145/3322905.3322931

Christian Reul, Dennis Christ, Alexander Hartelt, Nico Balbach, Maximilian Wehner, Uwe Springmann, Christoph Wick, Christine Grundig, Andreas Büttner, and Frank Puppe, "OCR4all—An Open-Source Tool Providing a (Semi-)Automatic OCR Workflow for Historical Printings" Appl. Sci. 2019, 9(22), 4853; https://doi.org/10.3390/app9224853

File Structures

Data in the "Processed OCR" directory have been OCR'd and ground truth has been produced.

The other directories named after editions and editors contain OCR input, output, and unprocessed results (but as ground truth -- no post-processing after ground truth). These documents have been or will be uploaded to GitDox. Check the GH repository information in GitDox for the location of each document.