Closed jbarth-ubhd closed 4 months ago
oops perhaps old version:
-rwxr-xr-x 1 root root 3007 Jan 18 2022 /usr/local/bin/ocr-transform
make all
& make install
are complaining about missing JPageConverter 1.5.06
; this helped:
root@pers16:/home/jb/ocr-fileformat/vendor# cp -a JPageConverter\ 1.5 "JPageConverter 1.5.06"
Did git pull && make && make install
(with circumventing JPageConverter 1.5.06
, see above),
same problem:
jb@pers109:/home/jb/ocr-fileformat# ocr-transform --version
ocr-transform v0.6.0-11-gee488dd
@stweil perhaps something is going wrong with JPageConverter
(see above)
commit 63de5ae7ae0f91365d16e77e1f3bd468eb819054
Use fixed JPageConverter 1.5.06 from UB-Mannheim
I cannot reproduce the issue:
ocr-transform page hocr vendor/page-to-alto/tests/data/OCR-D-OCR-TESS_00001.xml | fgrep -h cr_carea | sed 's/title=.*//' | sort | uniq -c
31 <div class="ocr_carea"
Try git status
. Are all submodules up-to-date?
root@pers109:/home/jb/ocr-fileformat# git status
On branch master
Your branch is up to date with 'origin/master'.
nothing to commit, working tree clean
Example: https://digi.ub.uni-heidelberg.de/diglitData/v/duerer1527_-_aa.PAGE.xml
root@pers109:/dwork/ocr/duerer1527/run-6# /usr/local/bin/ocr-transform page hocr duerer1527_-_aa.PAGE.xml |grep '"cr_'
<div class="cr_carea" title="bbox 144 141 554 189">
did rm -rf ... ; git clone ... ; make all ; make install
- problem still there.
additionally did a git checkout v0.6.0
- but then make all
complains ... AttributeError: 'NoneType' object has no attribute 'get'
It's a feature. Image regions and graphic regions get cr_carea
while text regions get ocr_carea
(see code).
Thanks! Looked like a typo.
When converting OCR-D *.PAGE.xml to .hocr, I'll get different 2 types of
<div>
classes: