-
I have issue when I add the Thai language to Scribe OCR as follows:
1. I just add the tha.traineddata.gz to \tess\lang but the Console log show "Error: Tesseract (legacy) engine requested, but comp…
-
**Describe the bug**
multi-line OCR output as single line text
**Where is the bug**
- Fullscreen Grab "multi-line OCR"
- OCR Output
**Expected behavior**
The multi-line OCR should output…
-
It would be especially helpful when you have a lot of screenshots, diagrams, photo of slides, etc., embedded in documents or as stand alone image files. Text in images may contain a large amount of in…
-
- OpenCV => 3.2
- Operating System / Platform => Windows 64 Bit
- Compiler => Visual Studio 2013
- tesseract=>3.04
opencv_contrib/ (sample) end_to_end_recognition /ocr_tesseract.cpp ,line 203…
-
When I choose recognize with Dutch on this image:
https://user-images.githubusercontent.com/3341558/175789293-f39ddfdb-6f3e-4598-8d16-80a1f4a88b36.jpg
the final sentence is correct, but a sente…
rmast updated
1 month ago
-
Hello,
I wish to know how to use the traineddata available from tesseract-ocr without inducing
actual_tessdata_num_entries_
-
Hello,
in Nextcloud it is not possible to index pdf content from scaned dokuments. The reason for this is the pdf file format itself. When you scan a document and save it to pdf there is no "real …
-
**Describe the bug**
User gets a `TesseractError` when processing a particular document.
**To Reproduce**
Code was an API call with a certain image-based document.
**Expected behavior**
Docum…
qued updated
4 months ago
-
The hope here is to get TikaOnDotNet fully configured to access Tesseract OCR for text extraction from images. With Tika .93 support for Tesseract was added, and we are now in the midst of validating…
-
Maybe related to [this](https://github.com/Unstructured-IO/unstructured/issues/3076). When using in the context of a binary file an error is thrown.
Example:
with open ("./that.pdf", 'rb') …