-
### Description of the bug
My pymupdf version is 1.24.14.
I tried invoke `add_redact_annot(rect)` and `apply_redactions(images=0)` to remove some words from original pdf file. The locations are sh…
-
Para consultar sobre una librería deben responder esta issue con una pregunta de la siguiente forma:
> Puedo usar la librería nombre_librería para hacer explicación_de_lo_que_quiero_hacer
¡Rec…
-
I noticed that the maybe_is_text() check discards quite a few perfectly valid and well-parsed publications. The issue is that it checks the entropy of the first text chunk of a document. Document pars…
-
Thank you for this excellent muPdf wrapper!
One feature that muPdf does not implement natively is layout-preserving plain text extraction.
- XPdf / poppler's pdftotext offer a `layout` mode as sta…
-
### Describe the bug
when doing pip install "pyautogen[long-context]" PyMuPDF this should lead to the opportunity to do from autogen.agentchat.contrib.capabilities.text_compressors import LLMLingua…
-
pymupdf is licensed under the AGPL. Please consider using a different PDF parser to make this project suitable for corporate use.
-
### Description of the bug
In some cases PyMuPDF is adding newline characters in the middle of words which do no exist if you simply copy/paste the text from the PDF or extract the text using other l…
-
Can we add some sort of toggle / support for enabling full page OCR reading via Tesseract, when pymupdf is installed? I hacked around the vendored library in my local virtualenv and made a change in `…
-
## 🐛 Bug
With pyodide-build-0.29.0, building a shared library seems to produce an empty library with no symbols.
This is when building libmupdf.so to be used in a PyMuPDF wheel.
With pyodide-…
-
See: https://github.com/pymupdf/PyMuPDF/issues/3635
Our current workaround: 7098f2e
If better solutions or workarounds are suggested in the PyMuPDF issue, we should implement them