-
### What were you trying to do?
I have used ocrmypdf to perform OCR on a PDF document, but I'm encountering a specific issue with RTL (right-to-left) languages like Persian. Despite successful OCR …
-
## Recipe Name
structural navigation using Ranges: 2 Ways
## Use case
You can enhance your manifest by adding Ranges that reference specific parts of canvases. By using the label field to des…
-
Для распознавания текста на изображении в Python лучше всего использовать библиотеку OCR (оптическое распознавание текста). Вот основные шаги для достижения наилучших результатов:
1. **Выбор библиоте…
-
I'm fired up about a rust implemented document parsing / embedding engine for my code and documents. Sadly, I don't see a good PDF ingestion in the code.
Ideally, I'd like to import PDFs from acad…
-
### Is Your Feature Request Related to a Problem?
**Problem**: Xournal++ is niche product, known by comparatively small number of Linux users. Even if someone finds out and loves Xournal++ for its…
-
## Description
When using Marker to extract Chinese characters from some PDF documents, some characters are not extracted at all, while others are extracted as garbled text. Below are three example f…
-
OCR desktop provides a gui interface that with a number of options. It generates a list of the rectangles of each word, which could potentially be used to facilitate word highlighting for the screen r…
-
While using the `v3.0.79` It seems that some pdfs are not currently parsed well when using `from pypdf import PdfReader` in `backend/danswer/file_processing/extract_file_text.py`.
The result is tha…
-
### Question Validation
- [X] I have searched both the documentation and discord for an answer.
### Question
How to solve the problem of garbled characters caused by pictures in the read doc
-
I'm not much into python, I get some weird error, looks like I have more than one version so not sure how to run the correct one, here is what I run and the result:
```
python -m pip install --upg…