-
### Bug
Running spacy-layout on a Apple M3 Pro with 36GB memory.
Python version 3.11.7
The following code is invoked in a python Jupyter notebook:
```
from docling.document_converter import…
-
Many organizations in my country allow documents to be sent as PDF files.
But it must be PDF/A-3 only because the data must be processed further.
Now in the official document, we can choose to make …
-
### Bug
I got the error `RuntimeError: [json.exception.type_error.302] type must be string, but is object`
![image](https://github.com/user-attachments/assets/0a22c3ed-46c2-4578-9170-4bf707cd86b…
-
### Description of the bug | 错误描述
使用官方样例报 `ModuleNotFoundError: No module named 'mupdf'`。尝试安装 mupdf,提示已安装。尝试安装 mupdf,报错。
### How to reproduce the bug | 如何复现
```
import os
from magic_pdf.data.da…
-
### Description of the bug
When I use `page.get_text('blocks')` , I get the very similar text with different bbox.
The output of Page 5 (start from 1) as follows:
![image](https://github.com/use…
-
**Implement a Retrieval-Augmented Generation (RAG) chatbot that takes a PDF document as context to answer user queries using LangChain.** The solution should:
Load a PDF document and extract its te…
-
This bug will need to be fixed and tested in both Pre-Integration and Integration
The Pre-Integration fix will need to be placed behind a PROD flag until the bug passes UAT.
Details:
Currently for…
-
I want to compress images in the pdf, but what I do is not work.
here is my algorithm:
```shell
- FPDF_GetPageCount
- Loop through every page
- FPDF_LoadPage
- FPDFPage_CountObjects
-…
-
const dataUrl = 'data:application/octet-binary;base64,' + url;
const blob = await fetch(dataUrl)
.then((res) => res.blob());
const ext = new pdf.ExternalDocument(blob.arrayBuffe…
-
**Describe the bug**
Since react-pdf is not accepting SVG images i have created a function which first converts the svg into png and then to base64 string. This solution is working fine when renderin…