UB-Mannheim / ocr-fileformat

Validate and transform various OCR file formats (hOCR, ALTO, PAGE, FineReader)
https://digi.bib.uni-mannheim.de/ocr-fileformat/
MIT License
176 stars 23 forks source link

page2tsv #99

Open kba opened 4 years ago

kba commented 4 years ago

https://github.com/qurator-spk/page2tsv Very specific format but obviously a scalable way to do heavy data processing.