Open zuphilip opened 8 years ago
Yes, I've seen it but I very much prefer a declarative transformation in XSLT that has no possible side effects and is easier to test. Maybe we can convert it to XSLT?
Yes, it would be preferable to use a XSLT for the transformation.
There is also a newer implementation with Java (+Maven): https://github.com/Mewel/abbyy-to-alto
How does that compare with https://github.com/PRImA-Research-Lab/prima-page-converter @maxnth
There is also a newer implementation with Java (+Maven): https://github.com/Mewel/abbyy-to-alto
That source code includes at least one copyrighted ~xsl~ file.
There is also a newer implementation with Java (+Maven): https://github.com/Mewel/abbyy-to-alto
That source code includes at least one copyrighted xsl file.
It does? I only saw that they include the copyrighted schema for Abbyy 10. We could ask ABBYY for a license to redistribute or omit that file and use the make vendor
mechanism.
How does that compare with https://github.com/PRImA-Research-Lab/prima-page-converter @maxnth
I had problems with prima-page-converter (going to open a bug report), while Mewel/abbyy-to-alto worked right away.
they include the copyrighted schema for Abbyy 10
Yes, sorry, that was the one which I meant.
I had problems with prima-page-converter (going to open a bug report),
https://github.com/PRImA-Research-Lab/prima-page-viewer/issues/24 - I opened the issue against prima-page-viewer as it is affected, too.
while Mewel/abbyy-to-alto worked right away.
Sort of - it does not produce Processing
tags (or the ALTO v2 equivalent), so it is lacking too.
There is also a newer implementation with Java (+Maven): https://github.com/Mewel/abbyy-to-alto That source code includes at least one copyrighted xsl file. It does? I only saw that they include the copyrighted schema for Abbyy 10. We could ask ABBYY for a license to redistribute or omit that file and use the
make vendor
mechanism.
I'd also like to point out that prima-page-converter has a similiar problem: the PrimaText library is not open source https://github.com/PRImA-Research-Lab/prima-page-converter/issues/17#issuecomment-769817720
Somehow related: I just found a converter from ABBYY to hOCR made by the Internet Archive. No own tests done so far.
while Mewel/abbyy-to-alto worked right away. Sort of - it does not produce
Processing
tags (or the ALTO v2 equivalent), so it is lacking too.
I've added that in https://github.com/Mewel/abbyy-to-alto/pull/16.
https://github.com/ironymark/AbbyyToAlto, Transformation with php, GPL v3