UB-Mannheim / ocr-fileformat

Validate and transform various OCR file formats (hOCR, ALTO, PAGE, FineReader)
https://digi.bib.uni-mannheim.de/ocr-fileformat/
MIT License
176 stars 23 forks source link

Add hocr__page transformation #113

Closed zuphilip closed 4 years ago

zuphilip commented 4 years ago

The transformation altopage does actually also support hocr input and therefore we can simply use it for the transformation hocrpage as well. One should only need some symlinking and updateing the documentations. But we may want after PR #106 has been integrated to do this.

BTW one can already try this out by using the alto__page transformation on a hocr file, e.g. https://digi.bib.uni-mannheim.de/~stweil/ocr-praxis/0001-tesseract.hocr .

stweil commented 4 years ago

Commit cedace708004a6971f6fbfb98ac039e82681a016 (now added to PR #106) should do that.