UB-Mannheim / ocr-fileformat

Validate and transform various OCR file formats (hOCR, ALTO, PAGE, FineReader)
https://digi.bib.uni-mannheim.de/ocr-fileformat/
MIT License
176 stars 23 forks source link

page__alto transformation mixes XML with logging in the output #143

Closed bertsky closed 2 years ago

bertsky commented 2 years ago

…resulting in invalid XML of course. Here is the culprit:

https://github.com/UB-Mannheim/ocr-fileformat/blob/d0c78538e35d4cfff23fda0a82a75440f15af0e6/script/transform/page__alto#L18

Since page-to-alto has its own logging (to stdout as well), this will get warnings and errors mixed into the output.

stweil commented 2 years ago

@bertsky, can we close this issue?

bertsky commented 2 years ago

Yes we can.