-
Hi @dinosauria123!
This is the issue I posted on ocr-fileformat: https://github.com/UB-Mannheim/ocr-fileformat/issues/121
As per your request I'm opening the issue here, copying the text:
I have …
-
[This image](https://digi.bib.uni-mannheim.de/reichsanzeiger.fcgi?FIF=/reichsanzeiger/film/026-7945/0019.jp2&CVT=jpeg) contains a full page of vertical text lines. The [hOCR ouput](https://digi.bib.un…
-
On some page images full of text Tesseract does not detect any text when using the default settings. Typically it prints `Empty page!!` twice for such pages. See issue #3021 for details and examples.
…
-
When I execute this:
```
$ tesseract img.png img hocr
Tesseract Open Source OCR Engine v4.1.1 with Leptonica
$ hocr-pdf . > new.pdf
/usr/local/bin/hocr-pdf:134: DeprecationWarning: decodestring()…
-
Hi there!
I was using UB-Mannheim's `ocr-fileformat` to convert hOCR to PAGE XML. It internally uses PRImA.PageConverter to do the job which itself depends on this java library. It succesfully conver…
-
https://github.com/HKWhyIP/hocr-tools/commits/master
-
How about including the hocr-tools in the ocropus-docker https://hub.docker.com/r/kbai/ocropy/ ? I would guess they should be packed together...
-
The README [currently](https://github.com/tmbdev/hocr-tools/blob/76fe7679329776d325a1d22cf3b8aa2105141589/README.md) states:
> Each command line program is self contained; if you have Python 2.7 with…
-
It seems easy to add some `id`s in the hocr output. Some change like this
```diff
diff --git a/ocropus-hocr b/ocropus-hocr
index 5ac022e..d3f46ac 100755
--- a/ocropus-hocr
+++ b/ocropus-hocr
@@ …
-
Hi~~ I am using LayoutLM for document classification. I have generated the hocr for RVL dataset by tesseract tool. However, with the exactly same setting, I only got an acc of 87.9%. I guess there may…