UB-Mannheim / ocr-fileformat

Validate and transform various OCR file formats (hOCR, ALTO, PAGE, FineReader)
https://digi.bib.uni-mannheim.de/ocr-fileformat/
MIT License
176 stars 23 forks source link

Page to alto python #134

Closed kba closed 3 years ago

kba commented 3 years ago

Replaces PRImA PageConverter with https://github.com/kba/page-to-alto for PAGE->ALTO conversion.

stweil commented 3 years ago

Would it help to support both converters for some time to allow comparing the differences if there is a problem?

kba commented 3 years ago

Would it help to support both converters for some time to allow comparing the differences if there is a problem?

OK, I've re-added the old transformation as ocr-transform page alto_legacy. Since we can now create Kitodo-ingestable ALTO from OCR-D with the python implementation, I'd appreciate it if we could merge this soon.