-
**Which SIST2 component is your Feature Request related to?**
Scan
**What would you like to see happen?**
Ability to specify the PDF page from which the thumbnail gets generated
**Additional c…
-
### Description
PDF is a document with mixed graphics and text. When we are doing RAG, the pictures in the PDF often contain important information, so we generally need to return the parsed pictures …
ic-xu updated
2 months ago
-
-
-
### What would you like to see?
Our another Azure OpenAI solution with Azure Document Intelligence works great at indexing PDFs containing charts and tables, enabling accurate data extraction from th…
-
Thanks for your great work! But it still has some problems. I have a PDF, which is not scanned(you can select the words in the files). When using your method, it will recognize 'benefit' as 'benets'. …
-
This issue is a master issue/epic and can lead to subissues that will be referenced from here.
## Proposal
The extractor package will have the capability to extract vectorized text and objects (wi…
-
### 软件环境
```Markdown
- paddlepaddle:
- paddlepaddle-gpu: 2.5.2.post120
- paddlenlp: 2.8.0
- paddleocr: 2.6.1.3
```
### 重复问题
- [X] I have searched the existing issues
### 错误描述
```…
-
### Feature Request - use documents without localdoc processing
One such use case - such as docx data extraction to json - for cleaning data for fine-tuning models or for localdocs. This feature wo…
-
hello,
after extracting once, adding highlights to the same pdf and trying to update the extraction is not working.
Obsidian version 0.11.0
thanks