-
Hey everyone, I have a problem with the locally hosted llmsherpa api, I've followed every step on https://github.com/nlmatics/nlm-ingestor but still can't get my documents chunked once I'm connected t…
-
I have been trying to analyze the documents using layout parser on different types of documents, I am able to get expected results on True pdfs but not on scanned pdfs, it is detecting the scanned pdf…
-
I noticed scanned PDFs are not imported when loaded with the SDK or the GUI. To cope with that, someone implemented an OCR layer (#1610). You can simulate this behavior with any scanned PDF, such as …
-
ocrmypdf works great with pdfs with scanned images . However in case of handwritten letter, the tessaract-ocr engine struggles many a time.
How do I use Azure ocr API as the OCR engine keeping everyt…
-
Hi Quan,
Hope you're doing good. I have developed tessesract ocr application in spring boot. This application must scan 600,000 pdf scanned images. Currently , I am using tess 4j 4.4.0 version. It …
-
### Search before continuing 先搜索,再继续
- [X] I have searched the Data-Juicer issues and found no similar feature requests. 我已经搜索了 Data-Juicer 的 issue 列表但是没有发现类似的功能需求。
### Description 描述
There is a l…
-
When indexing PDFs which contain embedded images or PDFs which have been created from (scanned) images, they are also listed as "Images". As I consider PDFs as text documents I do not want this catego…
-
```
* Xreader version 3.6.3
* Distribution - Mint 21.1
```
**Issue**
Printing some PDF files (not all) results in loss of data on the printed pages. I've experienced this issue with a few PD…
-
Thanks for your excellent work.
I want to get best confidence score in result return from model. I don't see any docs or detail example from it.
```
result = inference_detector(model, img)
…
-
Script works great and PDFs go into apple notes - THANKS
But running into this issue with the scanned notes: https://forums.macrumors.com/threads/apple-notes-search-in-pdfs-from-scanner-not-working.2…