UB-Mannheim / ocr-fileformat

Validate and transform various OCR file formats (hOCR, ALTO, PAGE, FineReader)
https://digi.bib.uni-mannheim.de/ocr-fileformat/
MIT License
176 stars 23 forks source link

Add abbyy2hocr transformation by @OCR-D #92

Closed zuphilip closed 4 years ago

zuphilip commented 5 years ago

I did not yet test it, but it looks straightforward.

kba commented 4 years ago

Maybe including it in ocr-fileformat will increase the visibility of the transformation so that those shortcomings can be remedied.

Can you open issues at https://github.com/OCR-D/format-converters/issues lest we forget @jmechnich? Thank you

zuphilip commented 4 years ago

@kba Yes, thank you for the suggestion. That is what we discussed last Friday and I said that I will do that. We also discussed that at least the required parameters should be make optional for the integration here, because otherwise it might not be possible to use that in the GUI.

zuphilip commented 4 years ago

Here is a PR for making the parameters optional: https://github.com/OCR-D/format-converters/pull/8

zuphilip commented 4 years ago

The upstream PR is merged now. @jmechnich Can you test the new version where the parameters are not mandatory anymore? Is this now giving results in the GUI?

jmechnich commented 4 years ago

Upstream is already broken by another commit. :) Actually, maybe we should consider using git submodules or a specific commit for the vendor packages in vendor/Makefile as this is not the first time sth like this happened.

zuphilip commented 4 years ago

Here is PR for fixing the newly introduced regressions https://github.com/OCR-D/format-converters/pull/11

BTW I would love to see also some tests integrated with CI in the upstream repo.

zuphilip commented 4 years ago

Okay, upstream is merged now again. @jmechnich Can you give it a try again? :pray: