Closed FoxKyong closed 8 years ago
Which hOCR did you use for that test? Could you please add it here to allow reproducing the problem?
I have attached the file. But the same problem is caused by every hOCR I tried to convert. hOCR is created by Tesseract v3.04.01. 060.hocr.zip
Thanks for trying @FoxKyong and for asking for ALTO support in tesseract.
Problem is in https://github.com/kba/hOCR-to-ALTO/, I'll look into it.
The problem was with mapping language. Should be fixed in https://github.com/kba/hOCR-to-ALTO/issues/1. Can you try
(cd vendor/hOCR-to-ALTO; git pull)
and try the transformation/validation again?
I have tried it and it works. Thanks.
I tried conversion from hOCR to ALTO-2.0 and after that when I tried ocr-validate on that file I got:
I also tried to convert it to other versions of ALTO but that all failed but it was just for testing because I need version 2.0.