-
E.g. mark more than 1 node and delete them or change their x_fsize resp. x_fsize of all contained ocrx_word nodes, all at once.
-
### Environment
tesseract 4.0.0
leptonica-1.76.0
libjpeg 8d (libjpeg-turbo 1.4.2) : libpng 1.2.54 : libtiff 4.0.6 : zlib 1.2.8
Found AVX
Found SSE
Using the command : --psm 6 --oem 1 -l…
-
is there a way to enhance it in a way that i can select multiple words from hovering UI.
And then use the combined content and combined bounding box to annotate things ?
-
It would simplify people's life A LOT, if you could write a version of hocr-pdf that does everything on its own:
create the hOCR for all of a pdf's pages, merge them, then merge the resulting file wi…
-
There should be possibility to produce OCR result without hyphenated words in case of hocr output or in case of paragraph(RIL_PARA)/block(RIL_BLOCK) page iterator level.
Test image:
![paragraph_sk…
-
> Also what will happen if we go ahead and change the encoding from 'latin-1' to 'utf-8' would that help if we are dealing with lets say Arabic Typescript.
Possibly, I have never used `hocr-pdf` wi…
-
`ami-phylo` will run from the commandline and should, as far as possible, manage options from there. Please list here the options that should be part of the commandline.
-
The pdf file generated using `hocr-pdf` has Hebrew text printed in the opposite direction.
Steps I followed:
1. I used Google cloud vision to get the OCR
2. Used gcv2hocr to generate hocr.
3. U…
-
For PDF's that contain text I am using pdf2json which gives me all the text nodes and PDF co-ordinates, for PDF's that do contain text I am using node-tesseract, however this extracts just the text, i…
-
### Environment
* **Tesseract Version**: 5.2.0
* **Platform**: Windows 10, x64
### Current Behavior:
PDF Render renders two different lines on the same line, intermixing the chars.
I cann…