UB-Mannheim / ocr-fileformat

Validate and transform various OCR file formats (hOCR, ALTO, PAGE, FineReader)
https://digi.bib.uni-mannheim.de/ocr-fileformat/
MIT License
176 stars 23 forks source link

:arrow_up: Upgrade to new version of hOCR-to-ALTO #116

Closed zuphilip closed 4 years ago

zuphilip commented 4 years ago

This solves #95 and #81 also no special features of ALTO 3.0 or ALTO 4.0 are considered in the transformations, but this would be anyways something for upstream.

stweil commented 4 years ago

Should be still provide the conversion from alto2.0 / alto2.1 to hocr (even if it is no longer needed) to be backward compatible? Removing it before release 1.0.0 would violate semantic versioning.

zuphilip commented 4 years ago

So you mean to do some other symlinking for these names of the transformations? Well, it would be easy to do but I am not sure whether it is needed. My feeling is also that releasing v1.0.0 might be soon now (merging PRs + testing should suffice IMO now). Maybe we can merge first the other PRs and then see how to proceed here. Okay? Or do you think we should first release a v0.3.0?

stweil commented 4 years ago

I suggest indeed to keep (modified) symlinks for those two transformations until a 1.0.0 is released.

stweil commented 4 years ago

Making a 0.3.0 as kind of pre-release for 1.0.0 would allow a longer test phase.

zuphilip commented 4 years ago

Please have a look at the new version.

stweil commented 4 years ago

https://digi.bib.uni-mannheim.de/ocr-fileformat/ now uses the new code.