UB-Mannheim / ocr-fileformat

Validate and transform various OCR file formats (hOCR, ALTO, PAGE, FineReader)
https://digi.bib.uni-mannheim.de/ocr-fileformat/
MIT License
176 stars 23 forks source link

update textract2page, hOCR-to-ALTO and alto-schema #166

Closed kba closed 12 months ago

stweil commented 12 months ago

Why is this PR in draft mode?

kba commented 12 months ago

Why is this PR in draft mode?

Because I intended to include the table support in textract2page but it is not merged yet, cf. https://github.com/UB-Mannheim/ocr-fileformat/pull/166#issuecomment-1708684176

But you're right, there's no reason not to merge anyway.

stweil commented 12 months ago

@kba, the updated hOCR-to-ALTO breaks the conversion from hOCR to ALTO because hocr__alto.xsl is now required and was missing. That is fixed in PR #167.