-
### Description of the bug
I have a pdf document from which I want to extract text.
PDF - https://nvlpubs.nist.gov/nistpubs/SpecialPublications/NIST.SP.800-223.pdf
For extracting the text on Pag…
-
如题,训练stage-3,训练时正常,推理出现错误:
Traceback (most recent call last):
File "/root/GOT-OCR2.0/GOT-OCR-2.0-master/GOT/demo/run_ocr_2.0.py", line 245, in
eval_model(args)
File "/root/GOT-OCR2.0/…
-
Because species-ocr is not correctly ingesting images and species-web is down, we need to make use of some work-arounds so Charlotte can continue digitizing. She will begin imaging specimens again tod…
-
Currently settings are deleted and some "undefined" values are shown:
![Image](https://github.com/user-attachments/assets/b4e14de8-25b1-4c89-be6d-6a4326604105)
-
### Issues
- [X] I have browsed through the Issues. 我已浏览过Issues,确定没有重复提问。
### Umi-OCR version 程序版本
2.1.4
### Windows version 系统版本
10
### OCR plugins Used 使用的OCR插件
PaddleOCR
### Reproduction st…
-
The provided datasets have four variants, each serving a specific purpose, and contain a `text_description` as described below E.g gov:
1. **syntheticDocQA_government_reports_test** – **No text_des…
-
## Comportamento richiesto
potrebbe tornare utile l'integrazione con un sistema di Riconoscimento OCR
magari con https://github.com/tesseract-ocr/tesseract
per **registra gli acquisti tramite f…
-
### Bug
In case of tables where most of the columns are empty and one column is completely filled, the table that docling extracts truncates the filled column values.
### Steps to reproduce
I ha…
-
# Steps to Improve the Accuracy of the workflow
1. **Understanding the Structure**: From the images recognize the repeated format for each minister: three columns labeled "Subjects and Functions," …
-
It's stuck at the embedding stage. I've tried a solution to #7 but it did not help.
The output file such as `original_filename (ocr).pdf` is corrupted and can't be opened. Please see the debug below.…