-
I noticed scanned PDFs are not imported when loaded with the SDK or the GUI. To cope with that, someone implemented an OCR layer (#1610). You can simulate this behavior with any scanned PDF, such as …
-
### App Information
A locally hosted one-stop shop for all your PDF needs · Powerful PDF tools. **Stirling PDF** provides you with powerful, easy to use tools to manage your PDF files.
- Official We…
-
### Description of the bug | 错误描述
我的机器很差,内存只有40G,怕解析中途内存爆了,在解析一些5000多页的PDF的时候,我会先把PDF切成80页一个的小文件,然后再用MAGIC-PDF去解析。然后一大堆文件中偶尔会看到回显有如下日志这样的找不到图片的错误,一旦出现这样的错误,这个PDF就不会有任何layout或者markdown文件被输出。
不知道是不是…
-
### What were you trying to do?
I have used ocrmypdf to perform OCR on a PDF document, but I'm encountering a specific issue with RTL (right-to-left) languages like Persian. Despite successful OCR …
-
`ImportError: cannot import name 'cached_property' from 'nougat.utils' (/lfs/skampere1/0/emilyhyf/miniconda/lib/python3.12/site-packages/nougat/utils/__init__.py)
OCRing with base model failed on /lf…
-
### Issues
- [X] I have browsed through the Issues. 我已浏览过Issues,确定没有重复提问。
### Umi-OCR version 程序版本
2.1.3
### Windows version 系统版本
linux docker
### OCR plugins Used 使用的OCR插件
_No response_
### R…
-
All simplified Chinese characters in the MD file generated from PDF are garbled.
I user docling version 2
-
Can we add some sort of toggle / support for enabling full page OCR reading via Tesseract, when pymupdf is installed? I hacked around the vendored library in my local virtualenv and made a change in `…
-
### Issues
- [X] I have browsed through the Issues. 我已浏览过Issues,确定没有重复提问。
### Umi-OCR version 程序版本
2.1.3
### Windows version 系统版本
windows10
### OCR plugins Used 使用的OCR插件
PaddleOCR
### Reproduc…
deict updated
2 months ago
-
**Is your feature request related to a problem? Please describe.**
When I OCR a PDF, I would like to be able to open the PDF and see the OCRed text as a hidden layer.
**Describe the solution you'd…