-
### Describe the bug
In my case the creation of a PdfA increased the size by a multiple of 500 !!!
- IMO I identified the culprit: gs can not handle mixed portrait and landscape well.
After separ…
-
横版的pdf也支持ocr表格吗
-
I followed the Readme.md to install on Ubuntu 20.04 ,all stpes are ok.When run sample program ,it showed :
python pdf_extract.py --pdf assets/examples/example.pdf
Traceback (most recent call last):…
-
### MaxKB 版本
v1.2.0
### 请描述您的需求或者改进建议
首先感谢开发者开源这么好的项目!
有很多的PDF文档都是扫描件,MaxKB是无法正常识别的。
### 请描述你建议的实现方案
希望可以加入PDF的OCR功能,可以对PDF导入后先进行OCR识别:一般都是把PDF每一页转换为图片,然后进行识别。
可以参考这个开源项目:https://github.com/hir…
-
I'm fired up about a rust implemented document parsing / embedding engine for my code and documents. Sadly, I don't see a good PDF ingestion in the code.
Ideally, I'd like to import PDFs from acad…
-
### 请先确认以下事项
- [X] 已仔细阅读了 [README](https://github.com/tisfeng/Easydict#readme)
- [X] 在 [issues](https://github.com/tisfeng/Easydict/issues) 页面搜索过(包括已关闭的 issue),未发现类似功能建议
- [X] Easydict 已升级到 [最新版本](ht…
-
To compare different pipelines (LLMs, pdf2img, pdf2txt) we need a benchmark.
## 1. Choose a sub-set of datasheets of each manufacturers
* consider special PDFs that need OCR
* scrambled text
#…
-
Hey, I just stumbled upon rnote a couple of weeks ago and it's an amazing project. Thanks you for the work!
**Is your feature request related to a problem? Please describe.**
I went through grea…
-
### bug描述 Describe the Bug
File "/data/mlops/Open-Assistant/inference/server/oasst_inference_server/plugins/vectors_db/loaders/data_loader.py", line 383, in path_to_doc1
res = file_to_doc(file, …
-
### Simple sanity checks
- [X] This is an issue with an app that uses OCRmyPDF for OCR
- [ ] I am using a recent version of the third party app
- [ ] I will include a file that reproduces the issuse
…
deict updated
1 month ago