Closed bertsky closed 4 years ago
I'll have a look when I have time. It's in core libs PrimaDla org.primaresearch.dla.page.io.xml.sax.SaxPageHandler_Alto_2_1 (it's ALTO 2.1 upwards)
Thanks!
SaxPageHandler_Alto_2_1
looks very promising, I'd like to try extending it, but I have trouble getting all the PRImA projects to build in the first place. I even got to manually import the various libraries and repos into Eclipse (as existing projects, sometimes removing fixed paths like for GWT, or as new Java projects where no .project
was present). But alas, they give me tons of error messages when I try to build. Without instructions or documentation, this is just too much effort for me.
Sorry for that, I thought building would be easier. I'll remove the GWT stuff anyway soon I think. Hope that will improve things
I made an update, have a look if it works for you (I don't have proper examples for ALTO with glyphs)
It works – perfectly! Thanks!
(I don't have proper examples for ALTO with glyphs)
Above mentioned PR will add that functionality to Tesseract. (It's currently tesseract -l eng -c document_title=input.tif input.tif input.alto alto
to arrive at a input.alto.xml
file.)
@chris1010010 why you didn't merge the update at master
?
I see it in release
It's great that PageViewer already supports ALTO v4. But it seems that
Glyph
elements are not displayed yet (as they are for PAGE). Is it planned to add that anytime soon?(I would like to help, but I cannot even find where ALTO gets imported. Is this actually in
prima-core-libs
orprima-page-converter
?)